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PSYCHOLOGICAL REVIEW 


PERSONALITY, ROLE, MOOD, AND 
SITUATION-PERCEPTION: 
A UNIFYING THEORY OF MODULATORS 


RAYMOND B. CATTELL 
University of Illinois 


Separation of personality and role is important both conceptually and 
for valid personality measurement. The concept of role stands at a 
crossroad where formulation of the group, perception, and the psycho- 
logical situation intersect. The last is handled by distinguishing focal, 
ambient, and global stimulus situations, and representing them by 
situational indices obtained under different, controlled factor analytic 
experiments. Change of behavior in a role can be conceptualized as 
i change of personality or of perception or both. Formally, mood 
change can be brought with role change under a general concept of 
modulators, expressible as a set of weight changes in the behavior 
specification equation. The modulator concept is considered in the 
context of evaluating traits across stimulus situations, responses, and 


temporal occasions, in the BDR matrix. 


DEFINITIONS OF PERSONALITY, 
ROLE, AND GROUP 


The purpose of this paper is to 
present a concept and a model for 
handling role phenomena. To be 
precise about role, it is necessary that 
we bring concepts about personality, 
role, mood, the definition of the situa- 
tion, and the perception of the situa- 
tion into a single theoretical system. 
Most of the elements of this system, 
have, however, been precisioned else- 
where. Thus, for personality, we 
shall begin with the definition (Cat- 
tell, 1946, 1950), “That which permits 
prediction of individual differences— 
freed of intraindividual variation—of 
response in a defined situation.” This 
is expressed in its broadest form in the 
stimulus-response model : 

[1] 


bi = f(0:S;), 


which states that any response per- 
formance, j; is a function of the 
properties of the personality (or 
organism) O; and the situation, Sj. 
However, a multivariate experiment, 
leading to more refined conceptualiza- 
tion of the stimulus (Cattell, 1961a), 
and psychologically enriched knowl- 
edge of source traits (factors), (Cat- 
tell, 1946; Guilford, 1959; Thurstone, 
1938) has shown with confidence that 
this can be developed into the more 
powerful, explicit model in which O 
and S are themselves expressed in 
dimensions. This new understanding 
of the response in relation to per- 
sonality, stated in its simplest form of 
a linear and additive equation, now 
becomes: 


Pr = salu + sjTair s+ 


+ spe Tec + iT [2] 


w 


In this specification equation, pj: is 
the response performance magnitude 
of the person, 7; in the situation sj, 
Sj1, Sja etc., are the situational indices 
(factor loadings) found by factoring, 
for the variable pj; and the Ts are the 
k simple structure common factors (to 
which Tj, a specific, is added) sub- 
tending the personality sphere. This 
permits the personality, P, to be ex- 
pressed as a profile of dimensions as 
follows: 


Pi = Ty, Tair ++ Tes [3] 


(In standard scores, and neglecting, 
the specific factors, T; Te etc., here 
and henceforth, for general simplicity 
of representation.) 

For initial simplicity, we shall 
consider that, in our population, there 
is always one typical kind of response, 
pj, to the situation s; (which is physi- 
cally defined as, s(n), and that people 
are simply scored on quantitative dif- 
ferences in response magnitude. For 
such an invariable link, (s;—p;), we 
shall formally use the Thorndikian 
term, “bond,” (subsuming reflex) 
though later we can generalize from 
bonds to varied combinations of 
stimuli and responses, i.e., permit the 
breakdown of bonds into separate 
stimulus and response elements. 

As conceptually developed else- 
where (Cattell, 1946, 1957, 1961a, 
1961b), the psychological meaning, sj, 
of the physical situation, so), for a 
typical member of a population r, can 
similarly be written as a vector: 


Sje = Sji, Sjo** "Sik [4] 


the s’s being called, “situational in- 
dices” and experimentally determined 
as factor loadings for the variable or 
bond (s;—p;). It should be noted 
that this experiment must be carried 
out under those manipulative condi- 
tions which do not involve any 
difficulty whatever for the subject in 
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making the response, pj. For other- 
wise, magnitude of s’s response would 
reflect partly the difficulties of making 
the response, and we are concerned 
here only with perception. In fact we 
should strictly define the values in the 
vector as perceptual situational indices. 

Turning next to the definition of 
role, one must recognize that it locks 
into a definition of group, group 
syntality, role perception, and role 
contribution to syntality, for which, 
however, the reader will be referred 
elsewhere (Cattell, 1961b; Cattell & 
Stice, 1960). It must be recognized 
that, at a common sense level, four 
kinds of role can be defined, in a four- 
fold table, created by two dichotomies, 
namely, temporary versus permanent 
and common versus unique. 

A temporary role, such as a man 
puts on when he leaves his office and 
goes home to act as a father, or such 
as a friend puts on when he formally 
mediates in a quarrel between two 
friends, involves certain mental sets, 
the adoption of which is reversible. 
A permanent role, such as that of an 
occupation, involves habits that are 
permanently in action, i.e., do not 
alternate with different responses to 
the same situation. The latter is 
harder to distinguish from personality, 
the distinction resting on a definition 
of personality in the rest of the group 
not having this particular role. 

A common role is one which can be 
defined as one of a class of modal 
patterns in behavior discoverable in a 
group. All members of a certain 
subset of the group have this pattern. 
By contrast a unique role is one which 
only a given individual adopts, and 
which is most conspicuous in someone 
in a unique position (Mohammed, 
Napoleon). Obviously the most oper- 
ationally difficult separation of role 
from personality presents itself in this 
type of role which is both unique and 
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permanent; but the concepts and 
operations offered here suffice to do so. 
A common role is learned and devel- 
oped in connection with a socio- 
anthropological, institutional organ in 
the body of a cultural pattern. The 
unique role is a massive response to 
some equally real complex of situa- 
tions in the life of the individual. In 
either case, we define a role as that 
which causes a characteristic change 
in response to a whole complex (series) 
of situations from the values char- 
acteristic of the person when he is not 
in the role (temporary role) or of 
others who are never in the role 
(permanent role). This will be spelled 
out in measurement terms as we 
proceed. 

Confining ourselves for the moment 
to a common role, for simplicity of 
exposition, and because it is the more 
important, we note that two condi- 
tions are necessary for locating and 
defining any particular role instance, 
as follows. 

1. Taking a wide sample of social 
and general behavior measures, and a 
wide sample of people, one must locate 
the modal pattern itself by Q’ tech- 
nique (distinguished from Q technique 
in that it is carried only to cluster 
analysis, not to factor analysis, as in 
true Q technique) (Cattell, 1957). 
In this Q’ technique, further, one must 
use řp or some similar index, instead 
of r which neglects differences of 
absolute level of patterns (Cattell, 
1957, 1961b). The centroid of any one 
of the clusters found defines the role, 
Rr, as a vector or profile for that 
cluster, as follows: 

Rr = pu, par ‘Prr [5] 
in which the p values are scores on 
“ties,” i.e., role relation acts, which, 
in fact, are at modes or optima for 
performance of the role. 

2. It must be shown that, for the 
group of people so located, the p 
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values so obtained are character- 
istically significantly shifted, in the 
case of a temporary role, from what 
would be expected from the sheer per- 
sonality endowments of this group of 
individuals, alone. In the case of a 
permanent role, the equivalent dem- 
onstration would be that other persons 
of the same personality measurement 
react to these situations with score 
values which are essentially the same 
as the means for the general popula- 
tion and different from those of role 
incumbents. 

Condition 2 is a necessary part of 
the definition because the types dis- 
covered by the first operation obvi- 
ously need to be separated into those 
which are of a biological or clinical 
nature, and those which are of a 
sociological-role nature. For example, 
Huntington’s chorea, a racial type, 
schizophrenia, would also be caught in 
the net of the basic Q’ technique 
operation of finding species types 
(Cattell, 1957). The second condition 
above, plus the later demonstration or 
check that the patterns are culturally 
relative, is necessary to separate roles 
from other types. 

Any further research on a role, €g., 
on the way it is learned, the dynamic 
satisfactions it gives, the contribution 
it makes to group syntality, the way 
it is perceived, must hinge first on this 
location and definition of a special 
kind of type. An important thing to 
bear in mind in these further develop- 
ments is that the behavioral-percep- 
tual “ties,” which we have written 
here simply as p’s (dropping the s 
part now as taken for granted), relat- 
ing the role incumbent to the rest of 
the group, and to his physical environ- 
ment, are typically, two-way, mutual 
responses when directed to other mem- 
bers of the group. That is to say, we 
should really always write p and p', 
where performances have to do with 
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people, the p’ representing the reac- 
tion of the other person to the role in- 
cumbent. However, the p’ is not 
essential to our main development, 
and, in the space of the present article 
it can be handled only by implication. 


FOCAL, AMBIENT, AND GLOBAL 
SITUATIONS 


Although for simplicity we shall 
continue in this present section to 
compare the ordinary personality 
reaction when the person is out of a 
role with his specific reaction when he 
is in a role, yet we recognize that our 
introductory position is an over- 
simplification and that our ultimate 
theory must accept that no action is 
ever performed entirely out of a role. 
The real base level for personality and 
indeed the definition of personality, is 
that it deals with an abstraction rep- 
resenting an averaging across roles. 
However, for convenience, one can 
recognize that some role effects are so 
slight that for practical purposes one 
treats the person as reacting out of 
a role. 

Although the problems of the first 
step above—the Q’ technique location 
of types—are by no means slight and 
involve the problem of weighting 
elements and recognizing types in 
varying numbers of dimensions, they 
are sufficiently dealt with elsewhere 
(Burt, 1940; Cattell, 1957, 1961b). 
Our first concern in this article is with 
organizing principles for phenomena of 
the second kind, namely, the changes 
in behavior on stepping into and out of 
a role. The same physical situation, 
sgy is reacted to differently in and out 
of the role. A policeman out of uni- 
form may respond to a driver going 
through a red light with the startled 
annoyance of any other citizen. But 
on point duty and in uniform, he re- 
acts to it much more strongly. Inci- 
dentally, our mathematical model here 


will deal only with ties where the role 
changes the intensity of the response, 
not its whole nature. The existence of 
the latter kind of change does not pre- 
clude an accurate treatment by way 
of the former. 

In addition to meeting the same 
physical situation both in and out of 
a role, the person in the role may 
encounter and react to situations 
which simply do not exist for other 
people. Incidentally, in regard to our 
taxonomy of roles, we may notice that 
this is far more common in the 
permanent role. For example, many 
surgeons, but few other role incum- 
bents, find themselves presented with 
an anesthetized patient early on 
Monday mornings. Except for a 
farcical, or artificial experimental 
manipulation, variables of the latter 
kind will have no importance or 
reality for the member of the general 
population. Both of these ‘‘qualita- 
tive” differences can actually be 
brought under our general quantita- 
tive treatment, the former by having a 
variety of dimensions covering differ- 
ent qualitative responses and the 
latter by assigning the general public 
a zero score on the role-peculiar ties. 

For formulation, let us symbolize 
the purely physical situation by su), 
and its perceptual meaning by s; as 
defined for the ordinary person, and 
as sj, for persons in the role R. In 
words, we will call s; the “focal 
stimulus meaning” and sj, the “global 
stimulus meaning? for the same 
physical focal stimulus is differently 
perceived in the global context of the 
role. This implies that in the latter 
there exists, for the role-trained person 
only, some additional ambient stimu- 
lus or role cue which is capable of 
converting the simple focal stimulus 
meaning into the modified global 
stimulus meaning, which indicates 
that he must react to it as being in a 
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role. As we shall see, this ambient 
stimulus can be several things, €g., 
something that occurs before the focal 
stimulus is re-encountered, but, what- 
ever its nature, it results, for the role- 
trained person, in an increment of 
meaning which we can formulate as 
follows: 


Sj + Sri an Sjr 
(focal) + (ambient) = (global) [6] 
(role cue) 


Experimentally, the value for s,; is to 
be found by first finding s; and Sjr- As 
the multivariate experimental model 
for the meaning of a situation (Cattell, 
1961b) reminds us, the values for sj 
and sj, are to be derived from the 
magnitudes of the responses $j and 
pir The difference of the p; and pjr 
responses exists at two parts of the 
specification Equation 2 above. For 
the general population there is a 
difference in the raw score means from 
which the standard scores of the prs 
and the pj's begin, and a difference 
in the manner of personality involve- 
ment in the act, as expressed in the 
rate of change of the magnitude of 
response, p, in relation to the rate of 
change of personality factor level, 
i.e., as expressed by the tangents or s 
values in the specification equation. 

Psychologically to illustrate the 
latter, we might note that the severity 
of response may increase more rapidly 
with a personality endowment in 
dominance, E, and super ego strength, 
G, in a leadership role (Cattell & 
Stice, 1960)—such as the policeman 
cited—than in the general popula- 
tion situation. The values for these 
changes will be obtained in experi- 
mentally slightly different ways ac- 
cording to whether we are dealing with 
temporary or permanent roles. To 
calculate the former we take the same 
set of people and include in one and 
the same factor analysis a set of vari- 
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ables, sı through s,, encountered both 
in (the upper half of the matrix) and 
out (the lower half of the matrix 
labeled sj, to Sar) of the role situation. 
The specification equation, i.e., the 
factor loadings, will now be different, 
for the same population of people, for 
corresponding, i.e., sj and sj, rows of 
the factor matrix. 

To obtain a corresponding com- 
parison for permanent roles we take 
instead a sample of people in the role 
and a sample of people out of the role 
who have the same personality dis- 
tribution, and obtain the differing 
means and loadings in the two 
situations. The specification equation 
for out of the role behavior has already 
been written in Equation 2 above, and 
that for the role-peculiar behavior, 
where pjr is a response tO Sjr, We will 
write as follows: 


piri = SinT us + SiTi -sjaTxi (7) 


Because of the essential similarity 
of the change from Equations 2 to 7 to 
modulation, in transposing a key in 
music, we shall refer to the changes be- 
tween the s values in Equation 2 and 
those in Equation 7 as produced by a 
“modulating vector.” The definition 
of this vector is in principle simply 
the difference between the vector pre- 
sented by the s’s in Equation 2 and 
by the s’s in Equation 7, as repre- 
sented in Diagram 1. However, we 
have to remember that these are in 
standard scores, and that both the 
means and the sigmas of the responses 
are different in the situation s; and Sjr- 
If our objective is to predict the 
difference in magnitude of raw score 
response in and out of the role (Xp, 
and Xp;), then we must use Equation 
8, in which X, and X,, are the mean 
levels of response in the two situations, 
and the sigmas are the variability of 
response, respectively, within and out- 


Factor T} 
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Role Modulation Vector for Sj 


Factor T2 


Role effect on one behavior response, expressed as a modulating vector 


found as the difference of in-role and out-role vectors. 


side of the role situation. 


Xp, ai Xori = (sje; 
= Siro pjr) Tass + (Sjk0 p3 


Ta SjrkO pir) Tri ae (Og Ta Ro [8] 


Incidentally, it should be noted that 
since persons in permanent roles will 
naturally tend somewhat to have been 
selected for personality, this equation, 
in the case of the ‘‘permanent role 
modulation,” will need to be obtained 
by taking a group from the general 
population of the same 7 score means 
and ranges as in the role group. 
Returning to standard scores and a 
comparison of Equations 2 and 7 we 
next note that just as a meaning 
vector (Cattell, 1961b) can be given to 
any ordinary situation, as in Equation 


4 above, so now we can give a meaning 
to the ambient situation or role cue, by 
writing Equation 9. 


Sri = (Siru — S71), (Sra — Sja) 
ante 


sx) [9] 


Or, if we express each of these differ- 
ences in brackets by a new single 
symbol, s,;, then, 


[10] 


The relationship of Equations 2, 7, 
and 10 is expressed in Figure 1, 
though it must be remembered that 
this is a simplification and that a full 
representation would require the use 
of Equation 8 taking account also 
of the difference of mean level of 
response. However, provided this is 


Srj = Srjiy Srja** *Srjk 


Personatity, Rote, Moop, AND SITUATION-PERCEPTION 7 


kept in mind, and keeping to essen- 
tials, we may say that our definition 
of role permits the designation of a 
particular role by means of its 
modulating vector. To be exact—or, 
at least, cautious—we should say that 
a role is defined and designated by a 
whole set of modulating “vectors.” 
For there is, as far as we know, a 
different modulating vector for each 
(s;—p;) tie. 

It would, of course, be delightful if 
experiment should find that these 
modulating vectors are the same for 
all situations affected by a particular 
role, but this is most unlikely! It is 
more probable that the specific char- 
acter of a role will have to be ex- 
pressed, coarsely and incompletely, by 
finding the mean modulation across a 
standard, stratified sample of s;’s, 
chosen to be measurable for all roles, or 
by taking the whole matrix of modu- 
lating vectors for this standard set of 
variables. Since it would be safer 
with present knowledge to assume the 
latter we shall speak of a role modula- 
tion matrix. This role matrix, as 
illustrated in Table 1, gives for each 
role the means of calculating the 
behavior in the role situation from the 


TABLE 1 


Form or A ROLE MODULATION 
(TRANSFORMATION) MATRIX 


Trait 
Situation Tie 
Ti, T: T+, Te 
1 Si — Pr | Sinsir? SirrSirk 
2 
3 
z Sj — Pj | Sinsir SjrrSjrk 
N Sn — Pn | SnrtSnr2 SnrrSnrk 


Note.—sj — p; defines the situation j and the re- 
sponse to it, pj. sirk is the modification of the loading 
of factor k through the focal situation sj being encount- 
ered in the role state r. Tr, specifically a factor for role, 
loaded only in the role situation performances, is intro- 
duced here in anticipation of its later discussion. 


known general situational indices for 
the focal situations, as obtained for 
the general population before entering 
the role. In defining the concept it 
also gives it scientific, operational 
effectiveness, as any definition should. 


ALTERNATIVES OF CONSIDERING 
MODULATION AS PERSONALITY 
AND PERCEPTION CHANGE 


The above formulation has tem- 
porarily written a role change actually 
as a perception change—for s's are 
situational indices and, under special 
circumstances, situation perception in- 
dices (Cattell, 1961b). Yet it is still 
open to us to modify the formulation 
according to whether, in further 
theoretical development, we decide to 
consider stepping into a role as some- 
thing that produces a personality 
change or merely a change in per- 
ception. 

A digression into the meaning of 
perception is here essential. Paren- 
thetically, since the New Look epi- 
demic is still affecting some writers, it 
may be necessary categorically to 
disclaim here any intention of using 
perception as an introspection-derived 
term. We should indeed go further 
and consider Scheier’s proposition in 
which he has reminded the New Look 
excursionists that, even as a be- 
haviorally derived concept, perception 
may be redundant. His argument is, 
that if we enter the specification 
equation, knowing what the physical 
situation is and what the personality 
dimensions of the individual are, any 
reference to perception becomes un- 
necessary in prediction. For example, 
to say that a subject of IQ 140 will 
have a perception of the above 
sentence different from that of one of 
IQ 100, or that sly meanings will be 
read into a simple joke by a person 
of strong sexual disposition, does not 
require any new concepts; for this 
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perception is implied in the situational 
values (s’s) and the known implica- 
tions of the concepts of intelligence 
and of sexual disposition. The speci- 
fication equation operates effectively, 
as Scheier points out, with only two 
kinds of terms, for most psychometric 
purposes. But while agreeing that 
some absurdities have been per- 
petrated in using perception as a 
tertium quid, we yet believe that 
there is some value in experimenting 
with a model which treats a percep- 
tion, or at least a tendency to perceive, 
lasting for some appreciable time, as 
something having a special kind 
of independence of the personality. 
What is perhaps new in the present 
treatment is a definition of perception 
which makes it neither a free floating 
superfluous concept nor a simple deri- 
vation from personality, but a com- 
plex emergent from situation and per- 
sonality, and, the bringing of other 
phenomena into a general class with 
perception as “modulators.” 

In introducing this concept of 
modulators, it is not proposed that the 
psychologist consider the determina- 
tion of perception as ultimately resid- 
ing in anything other than the two 
fundamental terms which at present 
define it, namely, the nature of the 
organism (at present only defined by 
the traits, Ts), and the nature of the 
stimulus situation physically (at pres- 
ent reflected in the s’s). But it is 
proposed that temporary conditions 
be introduced as concepts which 
operate as temporary but systematic 
modulators of either s’s or Ts. First 
let us demonstrate our freedom, in 
theory, as stated above, to handle the 
change of perception in a role situation 
by either a change in the personality 
(which ipso facto changes perception) 
or a change only in the perceived 
meaning of the situation itself. To 
see what can happen over all terms 


let us first consider only one term 
from the specification equation, as 
set out in Equation 11 below. Al- 
though we have computationally, by 
our factor analytic model, obtained 
changes in the situational indices, yet 
once we have these values we are free 
to reinterpret the change arising in the 
total of the product term as a change 
in T, instead of a change in s, in the 
following equivalence. 


(Sin — Sa) Tus = Sa(Tiri — Ty) [1] 


That is to say, if we prefer to 
retain the ordinary meaning of the 
given situation; scp, by continuing 
to write Sjn Sja etc., unchanged (as 
in Equation 5 above and on the right 
of Equation 11), we can change the F 
values instead, by making Tır: equal 
to Ty, multiplied by sj-1 over sj. 
Thus we change the personality of a 
person entering the role, but not the 
general perceptual values for the 
situation. As just indicated, the 
numerical values, as s’s, would first 
have to be obtained by factor analysis, 
but, once having so obtained them, 
one can therefrom derive the changes 
in the Ts and re-enter the factor 
analytic prediction with the original 
s’s for each given situation. 

It is at first surprising to discover 
that there is nothing within the 
familiar model itself, and the usual 
empirical uses we make of it, to give 
greater legitimacy to either of these 
radically different alternatives! As in 
some aspects of other sciences, such as 
those which led to relativity in 
physics, either of two formulations, 
if consistently followed through, will 
continue to fit all the major facts. 
However, two considerations lead us, 
in the case of role, to decide to settle 
upon the theory which goes with a 
changing of the situational indices, 
instead of the personality source 
traits. First, we believe, on scant 
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present evidence, that the changes of 
T values i.e., the modulating vector, 
would be likely to be different in 
different situations involved in a role, 
ie., in different rows of the factor 
matrix. Now, if we think of the role 
itself as changing the total “per- 
sonality,” in a constant fashion, this 
is inconsistent. Second, since many 
aspects of personality are thought of 
as having physiological associations, 
and persistent, built-in directives that 
could not change with the speed with 
which an individual is known to enter 
a role, such a definition of personality 
(as a relatively permanent set of 
constants), precludes this conceptual- 
ization. However, we shall shortly 
encounter another kind of modulator 
which does require alteration of the 
personality measures, and which leads 
to a concept of modulators broader 
than that which we would develop 
from role alone. 


Roe Factor AND Its IMPLICATIONS 


A role has so far been defined as 
recognizable by a type pattern of 
behavior, and by a role vector chang- 
ing behavior estimation from out-to-in 
role behavior. A further associated 
structure must now be recognized in 
the role factor. It has been recognized 
that the global stimulus evokes role be- 
havior only in personalities equipped 
by previously learned habits (at- 
titudes and skill structures) so to 
respond. The arguments below sug- 
gest that for any one role there will 
appear, experimentally, a factor struc- 
ture, or possibly a small set of factors, 
giving the dimensions of this habit 
system. Since the scores of persons 
gathered in the role are not always 
at one extreme of the population 
range, but probably concentrate at 
some intermediate mode, the first 
discovery of such factors by ordinary 
linear correlation methods may be 
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somewhat obscured by a curvilinearity 
not fitting the factor model. But with 
attention to special scoring trans- 
formations fitting the roughly emerg- 
ing factors (Cattell, 1952), it should be 
possible eventually to handle these 
phenomena by the regular model. 
Psychologically we should expect the 
role factors thus revealed to be of 
dynamic modality, and probably con- 
nected, in order arrangements, 
under the major self-sentiment factor. 
To see this more clearly, let us take 
Table 2 to represent the result of 
factor analyzing 21 variables on a 
large sample of people, each person in 
which is capable of reacting in some 
degree in four roles, Re, Rr R,, and Ry 
For economy, we have represented 
the more general personality dimen- 
sions, which might normally require 
20 to 30 factors, by only 3, Ts, Te 
and Ta and we are going to suppose 
that the roles are represented 
factors Te Tr, Tx and Ti in the 
factor structure. There are seven 
situations, sı through sy, and each is 
measured in three different media of 
response observation (Cattell, 1957), 
egn Pn for rating, pj, for question- 
naire, and py, for objective test. 
These are introduced to remind the 
reader that in any examination of 
such a factor matrix, we should expect 
to find instrument factors, Fs, Fy and 
F., which need to be separated from 
the meaningful psychological traits— 
the 7s (Cattell & Warburton, 1963). 
If, now, a subset of variables, like 
1722) E N “14, and 19 are all 
measures of responses in a role 
situation, such as, number of miles 
about town covered per day, fre- 
quency of wearing a blue shirt, num- 
ber of letters carried, number of local 
addresses known, etc., then it is 
obvious that a factor would crystalize 
out, symbolized by T», a high score on 
which would distinguish postmen as a 
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TABLE 2 


HYPOTHESIZED PATTERNS OF OVERLAP OF PERSONALITY, ROLE, AND 
INSTRUMENT FACTORS IN TYPICAL SAMPLE OF VARIABLES 


Variable | | Statistical 
Personality factors Role factors Instrument factors breakdown 
f Tie = ie e es 

aip | SR Ta | Te Pte {Te fete | Te Te | E | Pe E; u he 
1 Srp | a qı si | x | | “i h’ 
2 | sr—pi: | a: qQ |r h V | Uy h’: 
3 SiPia | aa | Ss | | i KE hs 
4 Se—pu | as Is xX: | | u4 hy 
5S |srpn | as |b: | Ss | 9 | us | his 
6 Sr—ps | as | b: lz | ge us hts 
7 Sy—Ppn | G7 bs r3 } Xs uy hy; 
8 Ss—Paz | Os bs q: Yi tis hès 
9 | srpnu | a | bs h | 23 tig h% 
10 | sepa | Gio | be % BE tio | Ato 
11 Sepa | an | b: h | N un | hu 
12 | se—pa | 0 | bs re | Sa | | A tae | hhi 
13 | ss—pu | 0 |ò |c rs | | xs tis | hên 
14 |sspsa | O | bio | c | g | | ys tss | Trg 
15 ss—pa | 0 3 | ts | is uis | his 
16 Se—pa | 0 | cs Te | Xe tre | Aris 
17 Se— pe: 0 cs | | te Ys Mar hy, 
18 Ss—pex | 0 ce | ss | Ze ms | Mis 
19 | spn | 0 | a qs | | a | to | hèi 
20 | ss—pz | 0 cs rm | | | Pr uz | hz 
21 srpa | 0 E rs | | 27 ua ha 


Note.—Responses ;;. $i» and ġ,, are three different possible responses to the same physical situation sı, and 


so on for others. 


group from other groups.'! The possi- 
bility must be considered, incidentally, 
that two or more factors might actually 
be required to cover what lies in a par- 
ticular role behavior, but we will pro- 
ceed on the simplest picture. How- 
ever, the main point arising from this 
section is that what has been consid- 
ered as a general modulation of s 
values, would actually be largely a 
modulation upon the s value of the role 


1This factor might at first be somewhat 
obscured, as just mentioned, by, skewed dis- 
tribution in that only a small minority show 
appreciable scores, a curvilinear relationship, 
in that as stated above some roles are defined 
by optimal rather than extreme performances, 
and the all-or-nothing character of some of 
the responses, but, in this article space, we 
shall not digress into the operational steps 
for handling this problem. 


factor. This factor would always be 
operatively present in the specification 
equation, but with quite small load- 
ings in anything but the role situation. 

Psychologically, though the data is 
yet fresh and fragile in the area of 
dynamic structure factorization (Cat- 
tell, 1950; Cattell & Baggaley, 1958; 
Cattell & Horn, in press; Sweney & 
Cattell, 1962), we shall hypothesize 
that a role is a collection of mental sets 
within a dynamic structure factor. 
When one reflects how many small 
acts, e.g., saying good-bye to rela- 
tives, registering as a student, passing 
through customs, involve mental sets 
which persist over only a very narrow 
and brief set of responses, one might 
speculate indeed, that the factor 
analysis-of dynamic traits would yield 


ee 
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factors extending to an increasing 
number of small factors. Indeed, the 
specification equation might trail of 
into a veritable detritus of tiny factors 
representing brief mental sets. How- 
ever, the probability is that whole 
sets of these are organized hier- 
archically under the larger role factors 
we are now speaking about, for such 
subrole behaviors are practiced only 
by people experienced in the major 
roles. 

Nevertheless, it would also be 
possible for there to exist a number of 
“‘mini-roles,"" which are instruments 
to too many distinct larger roles to be 
expected to show a clear hierarchical 
vassalage to any one, in the hier- 
archical factor analytic sense. These 
would, indeed, cause the specification 
equation to trail off into a lot of small, 
narrow dynamic factors. Inciden- 
tally, bringing the old, familiar concept 
of “mental sets” under that of factor 
analytically discoverable, objective, 
dynamic-structure, source traits, in- 
volves implications in the reverse 
direction too. It argues that the 
properties of dynamic structure fac- 
tors should be investigated in terms of 
their being nothing but very broad 
mental sets. 

In recognizing the powerful effect 
of a role factor, we are not departing 
from the definition of role action as a 
situational index modulation which 
affects several personality factors— 
including the role factor. For the 
appearance of a role factor, for 
reasons proposed above, cannot itself 
be regarded as that which modifies 
the values on the other factors. Like 
them, it is but a factor, latent and 
ready to act when the situation 
provokes it, and having zero effect 
until multiplied by a nonzero s. If 
one asks what it is that is really new 
(and changes the s’s) in the role 
situation, it becomes evident that our 


theory has constructed everything but 
the trigger in this model for role 
action. Incidentally, this lack does 
not show up in the permanent role, for 
we have said that one and the same 
physical situation, scp, will be per- 
ceived by a person unequipped with 
the role in one way, and, by a person 
endowed with the role factor in 
another. On the other hand, in the 
case of the temporary role, what 
actually makes the person perceive it 
on one occasion as s; and on the other 
occasion as sj-? This modulating in- 
fluence can only be a new influence, m, 
which determines the individual pos- 
sessed of a role factor to bring it into 
action, and also affects his other factor 
involvements. From general psy- 
chology we know that it could be some 
spontaneous recollection within the 
individual, some new feature in the 
physical situation, $u) which makes it 
Sum, Or, some stimulus in St) in the 
external world encountered before the 
individual meets the situation, su): 
In any case, it must be viewed as 
activating the role factor, and parts 
of certain other factors, which would 
not have become active without its 
arrival. The above formulation in- 
cludes, of course, the possibility that 
the role factor will be negatively loaded 
on role behaviors (or that s's will be 
negative) producing that inhibition in 
arole situation which makes the range 
of response smaller than in those 
unrestricted by the role. 


EXTENSION OF MODULATORS TO 
INCLUDE THREE TYPES 


On looking at related psychological 
problems, one perceives that there are 
several very similar phenomena, not- 
ably that of moods, of recent learning 
effects, of transiently excited massive 
attitude sets, of multiple personality, 
of fatigue, etc., which require very 
similar treatment. 
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Unfortunately, to introduce the 
mood concept is to drag in something 
still used in psychology only at a 
popular level, and which requires 
preliminary structuring. The applica- 
tion of P technique and Incremental 
R technique to a wide array of be- 
havioral and physiological variables 
has revealed, so far, nine reasonably 
stable and intelligible dimensions of 
state change (Cattell & Scheier, 1961). 
There are undoubtedly many more, 
especially in the dynamic field (Cattell 
& Cross, 1952). However, insufficient 
research has been done to decide 
whether, in every case, these change 
factors should be considered as sepa- 
rate mood factor patterns in the 
specification equation, or whether the 
pattern is simply the change pattern 
upon a relatively permanent per- 
sonality trait which has always existed 
in the ordinary R technique specifica- 
tion equation. What seems probable, 
e.g., from such manipulative P tech- 
nique experiments as those of Grinker, 
and Cattell and Scheier (1961) is 
that variation on traits and mood 
states are two distinct things, e.g., 
the pattern of anxiety as a state is 
somewhat different from change on 
anxiety as a trait. Further, even 
among states, we may have to recog- 
nize a difference between internal, 
endogenous rhythms and true moods. 
The former we can distinguish from 
externally provoked moods by calling 
them humors. In any case, it seems 
reasonably certain that the ‘‘perman- 
ent” dimensions of personality (traits 
obtained by R technique, on scores 
averaged across days) show in two- 
thirds of the cases some diurnal 
variation, as in the factors of anxiety, 
ego strength, and cortertia (though 
others such as intelligence have not 
yet been shown to do so). 

Now what one knows about the 
psychological and physiological char- 


acter of a mood (Cattell, 1957; Nowlis 
& Green, 1957) makes one averse to 
expressing its modulator action only 
as a change in the situational indices. 
It is true that, just as with role, a set 
of focal situations are perceived dif- 
ferently, in the global context of the 
mood, for an appreciable time. But 
in this case we do not hesitate to 
conclude, from, e.g., physiological 
evidence, that the modulation must 
affect the organism, i.e., the T values 
for trait, mood, and humor factors 
(for state factors can go alongside 
trait factors) on the right side of 
Equation 11. Our theory differs, how- 
ever, from that which would simply 
indicate using the same specification 
equation with changed T values. For 
it supposes that change of response is 
not merely due to direct change in the 
level of mood factor endowment. 
Instead, it defines a mood as a modu- 
lator, i.e. something which also 
“sensitizes” other factors, i.e., it 
operates to change the situational 
indices upon them. 

Indeed, in regard to all these psy- 
chological influences now being dis- 
cussed, though a common modulator 
concept is possible and useful, one 
must not be seduced into over- 
simplification. At present it is best to 
experiment with three variants, con- 
sisting of two models and their com- 
bination, as follows: 


1. Organism dimension modulators. 
These, like moods, humors, and new 
learning experiences, produce revers- 
ible and irreversible changes on organ- 
ism dimensions, in the form of a 
unique pattern for each modulator, 
and thus affect all situations, while the 
modulation persists, through the ordi- 
nary specification equation. 

2. Situation-interaction modulators. 
These like roles and mental sets 
change the situational indices—the 
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perceptual interaction with the envi- 
ronment—by which dimensions pro- 
duce responses. As separate environ- 
mental influences or cues they may 
uniformly change all focal stimulus 
indices, or they may have to be 
represented by a role matrix to 
represent what is peculiar to each 
stimulus situation for the given role? 

3. Combined organism and inter- 
action or “complex” modulators. It 
is psychologically conceivable and 
mathematically expressible that the 
modulating influence should primarily 
alter the level on one or more per- 
sonality dimensions (7s), and that 
some change in level would coincide 
with a change of situational involve- 
ments (s's) on these and other Ts. 
There seems no doubt that such 
changes as the latter exist, from 
Saunders’ (1956) work on “moderator 
variables,” (though he investigated 
single variables rather than factors, 
and one needs to allow for the purely 
statistical effects of changes of means 
and sigmas of one set of Ts and s's in 
a specification equation upon others 
before concluding that some ad hoc 
new effect also exists). 

These three kinds of modulators 
can be expressed in formulae as 
follows: First, 

Ms = Sml, Sm2** *Smk [13] 
which is added to the s’s, and where 
m, is a situation interaction modifier, 
obtained as in Equation 9 above. 
Second, an organism dimension modu- 
lator, mo: 


(Ti a Tim); (T: = Tim) 
ate Tg — Toe) 
Tio, T20°* -Tro [14] 


2 [t will be noted that this modulation of the 
personality, which includes mood and humor 
changes, would also integrate well with the 
recent (Cattell & Scheier, 1961) formulation 
of personality learning as “multi-dimensional 
change through exposure to a multi-dimen- 
sional situation.” 


Mo 
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which is added to the Ts, and where 
the Ta's are obtained as in Equation 
11 above. Third, a complex modu- 
lator, Me = me + Mes, where, 


(i) Me ™ Smi, Sud * Smk 
added to s's, 
plus, 


(ti) me = Tio, Ta, ++ Tro 


added to 7s [15] 


but where each s is a function of all 
Ts. 
(iii) Smi = fi(T:) + f2(T2) 

++ f(T). 


A full handling of the derivations, 
and means of empirical statistical 
testing of the last would take us too 
far afield for this article. The choice 
between Equation 13 and Equation 
14, however, is to be made, as stated, 
by the fact that, in Equation 14 the 
numerical modulation values should 
hold across a whole set of specification 
equations, but not different popula- 
tion selections, whereas Equation 13 
need not hold across many situations 
but a range corrected function of it 
should hold for the same situation 
across different populations, and, the 
Equation 13 modulation should arise, 
experimentally, with immediate cue 
changes, whereas Equation 14 should 
be slower and more lasting. 

Although one can rather confidently 
hypothesize, before experiment, that 
the situation-interaction modulators, 
of the type mM., will be different for 
each situation in the same role (and 
therefore require a role matrix), yet 
it is possible that experiment will 
demonstrate that all m,,’s in a single 
role have a class membership. To 
investigate this it is proposed that all 
row vector modulators for several 
supposed roles and many situations 
be intercorrelated (using ”p instead 
of r). Cluster search might then 
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reveal that each role generates a 
cluster. If so, this would constitute a 
third way of recognizing the existence 
of a unitary role entity, i.e., by adding 
m, cluster definition to behavioral 
species type and role factor, definition. 

To conclude this formulation it is 
necessary to give more than the 
cursory definition yet accorded to the 
“trigger” which sets the modulators, 
as now sufficiently defined above, into 
action, for varying periods. Our 
discussion has hitherto referred to an 
ambient stimulus, which may occur 
either with the focal stimulus or 
before it, instituting a ‘“‘set,” a mood 
or humor change, referable either to 
a preceding experience or to a physio- 
logically induced change, or, a learning 
experience. In all cases, what we may 
call the modulating stimulus or condi- 
tion is either an external situation or 
an internal, physiologically originat- 
ing change of condition which is time- 
bound, i.e., endures only for a stated 
period. To track down and discover 
the existence and particular nature of 
these modulators it is necessary to 
enter a factor analytic experiment 
with a sufficiently diverse variety of 
focal stimulus-response bonds hypo- 
thetically occurring with and without 
possible modulators. After correla- 
tion of rotated factor matrix rows has 
revealed groupings into “types,” i.e., 
into distinct modulator influences, it 
is a matter of psychological insight to 
examine all situations sharing a modu- 
lator to determine the nature of the 
particular, simultaneous, internal, or 
prior condition common to them all. 
In the case where this modulator 
stimulus is an external, environmental 
situation the procedures have already 
been discussed more fully elsewhere, in 
connection with developing an ob- 
jective taxonomy of stimulus situa- 
tions in a science of econetics, i.e., 


broader psychophysical laws (Cattell, 
1961b). 

At this point it should finally be 
specified that differences in any modu- 
lator between the “in” and the “out” 
role or state must be taken between 
the “in” and (since the ‘‘out’’ does not 
exist as such) the average of all other 
situations in which the focal stimulus 
is encountered. ‘The “pure person- 
ality” response of a particular trait is 
the average performance, j, across 
many trait-loaded role situations, just 
as the pure (unmodulated) situation 
perception is represented (for any one 
factor) by the average of Sjr1, Sjro, etc., 
across all kinds of role encounters with 
that situation. 


PERSONALITY AND ROLE TRAIT 
ESTIMATION AND ANALYSIS OF 
RESPONSE VARIANCE INTO 
TRAIT AND SITUATION 
SOURCES 


It remains to see how the above 
concepts will permit one to give 
precise answers to age old questions 
such as, “How much of this behavior 
is due to personality and how much to 
role?” and, “Is this extreme reaction 
due to an extreme stimulus occasion ?”’ 
and “How much of behavioral vari- 
ance is assignable to personality, role, 
mood, and situation differences?” etc. 
For what in common or clinical speech 
is referred to as the “importance” of a 
concept can be given operational 
meaning as its relative variance con- 
tribution. 

The covariation chart, or basic data 
relation matrix (Cattell, 1946, 1957; 
Coan, 1961) reminds us that any 
psychological measurement has four 
signatures: stimulus situation, person 
(or organism), response, and condi- 
tion, on the occasion (excluding the 
fifth, observer, as irrelevant for this 
discussion). We shall assume that 
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each of these has been chosen in a 
stratified sample from its population 
(except that stimuli and responses 
cannot be completely randomized be- 
cause, even with the utmost learning, 
some responses cannot be, or at any 
rate are not, pairable with some 
stimuli). 

On the basis of this covariation 
chart we can assign the variance of 
any particular response, situation, 
etc., to fractions associated with other 
elements. The multivariate experi- 
mentalist, whose newer operationism 
is associated with patterns rather the 
single variables of the bivariate “brass 
instrument” experimentalist, will nat- 
urally want to see analyses of variance 
made with respect to these patterns 
rather than single variables. These 
patterns include personality factors, 
role factors, ambient or modulator 
stimulus patterns, and soon. To esti- 
mate any individual’s score on any 
personality factor (and thence the 
variance on a personality factor, or 
personality factors in general) we 
must, as the above conceptualization 
will have indicated, obtain a weighted 
(by factor weights) total from per- 
formances loaded in that factor and 
scattered across a good sample of roles. 
For example, to get a least biased 
estimate of personality factor of Ts 
in Table 2 one might take Variables 5, 
6, and 10, i.e., scores randomized 
across three different roles. 

Conversely, in estimating a role 
factor, one should seek to take vari- 
ables highly loading the role factor 
but well distributed in regard to their 
loadings on various personality fac- 
tors. Incidentally, both, of course, 
should further be distributed in rela- 
tion to instrument factors, for reasons 
brought out in recent discussion on 
instrument factors (Cattell, 1961a), 
and on methods of measurement 
(Campbell & Fiske, 1959). The role 
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variables, as pointed out at the be- 
ginning, are likely to be, in a certain 
sense, narrow and specific, and con- 
nected with sociologically identifiable 
situations, whereas the variables aptly 
used in personality factor estimation 
prove to be better taken from a great 
variety of situations, or, alternatively, 
from such situations as prove to have 
very little loading on any role factor, 

Reciprocally, in obtaining the mean 
vector for a role, or any other situa- 
tion-interaction modulator, one would 
choose a set of s;-’s which have been 
shown to belong to that one role and 
which are otherwise evenly sampled 
over other roles. Parenthetically, the 
possibility of separate scoring of role 
and personality factors has consider- 
able importance in several fields, 
notably parent-child study. The 
curvilinear relation found, notably by 
Cyril Burt (1948), between strictness 
of parental discipline and moral 
stability of the child might well prove 
explicable in terms of a different 
reaction on the part of the child to 
strictness emanating from a parental 
personality trait of hostility-rejection 
and from a role trait of adopting an 
ideal of parental strictness. 

At this juncture it may be necessary 
to dispel a false assumption possibly 
created by our initial ovérsimplifica- 
tion of approach. The notion of focal, 
ambient, and global situation with 
which we began, tends, in the writer's 
experience, readily to be generalized 
to mean that all life situations are like 
a Chinese nest of boxes, situation 
within situation, in some kind of 
hierarchy. 

Certainly some hierarchies will exist, 
and it should be the aim of experiment 
to explore them, but the more general 
condition is likely to be a set of 
irregularly overlapping roles. It must 
be recognized, indeed, that the in- 
dividual is frequently operating si- 
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multaneously in several overlapping 
role mental sets. For example, a 
father taking his son along to another 
city’s branch of the X Club to which 
he belongs at home is simultaneously 
in the roles of a father, a club official, 
a guest, and a host. But, except for 
the main hierarchy in which subroles 
and roles fit consistently into the self- 
sentiment dynamic factor (Cattell, 
1957; Sweney & Cattell, 1962)- it is 
hypothesized that roles can come 
together in any sociologically possible 
way. Consequently, the estimation 
of a role factor like T, in Table 2 
would rest on variables like 2, 12, and 
20 which cut across (or suppress) 
different personality factors, role fac- 
tors, and instrument factors. 

When personality and role factor 
scores on the one hand, and situational 
groupings on the other have been 
explored, it is possible to turn to the 
question which often bothers clini- 
cians, juries, and others in contem- 
plating an exceptional act: “Is this an 
egregious response because Smith is 
exceptionally endowed in Trait X, 
or is adopting Role Y in too extreme a 
form, or because he has encountered 
an extreme situation?” An answer 
can be given in terms of the standard 
score estimate of his personality trait, 
of his role trait, and of the extremity 
of the loading of the situation in a 
general population of situations re- 
ferred to these traits. And, in general 
terms, an analysis can be made of the 
total variance of a particular kind of 
response into that associated with 
variance on a personality trait, on a 
modulating factor, and on varieties 
of situation which permit that re- 
sponse. 

The value of the modulator concept 
is that of any scientific formulation : 
parsimony in explanation and econ- 
omy in computation. There exists 
virtually an infinite number of specific 


situations and specification equations. 
By taking » specification equations 
and m modulators (for roles, moods, 
common learning experience, etc.) we 
obtain predictions in n X m situations 
more economically than by measuring 
for all n X m specification equations, 
and with greater insight and control 
in regard to further conclusions. 


SUMMARY 


1. A model, with attendant psy- 
chology theory, has been presented 
comprehensively relating role, per- 
sonality trait, group, stimulus situa- 
tion, and mood, in which the main 
novelty of formulation is the concept 
of role and mood as “‘modulators.”’ 

2. Roles, whether temporary or 
permanent, common or unique, in- 
volve the capacity to react to a focal 
stimulus differently in a global situa- 
tion. This may be considered change 
in perception, due to an ambient 
stimulus, which can be written as a 
modulator vector, obtained by com- 
paring the specification equation and 
the mean raw score response magni- 
tude for the same physical situation 
reacted to in and out of the role. 

3. The perceptual meaning of a 
situation has a personal component 
from the trait scores of the individual 
and a situation-interaction compo- 
nent, defined by the situational indices 
(factor loadings) for that species of 
population. The latter must be 
experimentally obtained in controlled 
conditions in which the response 
involves nothing more than a recogni- 
tion response to the situation, other- 
wise the s’s will express also diffi- 
culties, etc., in making the motor 
response. 

4. Roles belong to the general class 
of modulators, which can be described 
as influences operating across several 
specific situations, and initially ex- 
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pressed as a modulating vector for the 
situational indices of each situation. 
Further observations on, the con- 
stancy of the modulation across 
several situations, and, the speed and 
durability of the modulation, permit, 
however, division into three classes 
of modulators: Situation-interaction 
modulators, as in roles, representable 
by a situation index change matrix; 
organism modulators, as in moods, 
personality learning, and humors, 
wherein the change in the product 
term, sT, is considered due to a change 
only in T, the trait; and, complex 
modulators, in which a change in a 
trait level may produce change in 
situational indices on other traits. 

5. The greater part of the modula- 
tion for a role act occurs on the loading 
for a role factor, which belongs to the 
modality of dynamic personality fac- 
tors. A given (common) role is thus 
locatable as a type, by Q’ technique, 
using rp or other similarity indexes 
which respect nonlinear relations as 
a factor and as a clustering of modu- 
lator patterns. In all these approaches 
regard must be paid to skewed dis- 
tributions, inclusion of some non- 
linear relations, and, in the typology, 
the existence of noncontributory di- 
mensions. 

6. No act is performed without in 
some degree involving role behavior. 
The attempt at unbiased estimation 
of a personality factor therefore 
involves samplings of personality- 
factor-loaded variables across di- 
verse role factors. Similarly the 
estimation of a role factor score is 
achieved by sampling across person- 
ality source traits (and both across in- 
strument factors). Thus approached 
it is possible to say how much variance 
in a response is due to personality (or 
one particular personality factor) 
‘variance, to role (or one role factor) 
variance, and to variance across 
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occasions (not involved in the role 
definition) to which the reaction is a 
possible response. Similarly it is 
possible to say that a response of a 
particular extremity, by a particular 
person, is associated with a particular 
extremity either of personality trait 
endowment, or of role adoption or of 
occasion. 

7. The theory of modulators offers 
the scientific economy of avoiding a 
vast proliferation of specification 
equations. Modulators are located by 
type grouping of specification equa- 
tion modulator vectors, and finding 
the ambient stimulus, physiological 
conditions, etc., associated with each 
group. Thereafter modulators are 
used as a concept introduced to 
modify a relatively limited number 
of personality-situation specification 
equations, by reference to a role 
matrix, etc., whenever the modulator 
is known to be present. 
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CHOLINERGIC MECHANISMS IN THE CONTROL OF 
BEHAVIOR BY THE BRAIN 
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Several experiments indicated that a cholinergic system in the brain 
antagonizes a 2nd system, which activates behavior. Neuropharma- 
cological considerations suggested certain drugs with which the activity 
of these systems could be altered. These experiments indicated that 
the cholinergic system acts selectively by preferentially antagonizing 
the effects of activation on unrewarded behavior. That is, there ap- 
pears to be a cholinergic involvement in the mediation of the effects 
of nonreward. Although such interpretations are necessarily very 
tentative, there are inferential grounds for supposing that a cholinergic 
system selectively antagonizes the effect of activation on certain 


that behavior is unrewarded. 


The discovery that the “tranquiliz- 
ers” block conditioned avoidance be- 
havior has markedly enhanced interest 
in laboratory techniques for the study 
of the behavioral effects of drugs. Be- 
havioral techniques have come to pro- 
vide reliable and sensitive means for 
detecting a drug’s actions. 

Since behavior has been used only 
as an indicator in these studies, be- 
havioral pharmacology has contributed 
little to an understanding of the vari- 
ables that may control normal behavior. 
Because of the sensitivity of behavioral 
techniques, they have, quite under- 
standably, been used chiefly in efforts 
to elucidate the effects of drugs having 
largely unknown actions. Thus, em- 
phasis has been on pharmacology rather 
than on behavior. 

The mechanism of action of most 
drugs that affect the brain is unknown. 
The actions of a few drugs are, how- 
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| behavior and that the basis of this selectivity is the extent to which 


ever, sufficiently clear so that it is 
feasible to reverse the emphasis and 
use the drugs as tools for the study 
of processes that may control the be- 
havior of the normal animal. The 
experiments to be discussed, and the 
conjectures based upon them, embody 
such a reversal of emphasis. 


NEUROPHARMACOLOGICAL 
CONSIDERATIONS 


The two purposes of this paper are, 
first, to discuss an hypothesis pertain- 
ing to how two systems in the brain 
may interact to control several aspects 
of learned behavior and, second, to 
describe several experiments that indi- 
rectly support this hypothesis. As an 
introduction, a brief outline of the 
pharmacological and neurophysiological 
background for the procedures involved 
in the experiments is required. 

Today most would accept as axio- 
matic the concept that naturally occur- 
ring chemical substances are involved 
in the transmission or modulation of 
the activity of the brain. Any one of 
these substances is a potential candidate 
for a role in the control of behavior 
(see reviews by Crossland, 1957; 
Giarman, 1959). The present dis- 
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cussion will be concerned first with 
acetylcholine, the one substance for 
which there is especially convincing 
evidence pointing to a significant in- 
volvement in brain activity (e.g, Feld- 
berg, 1957; Gaddum, 1961); and 
second, with the catecholamines (epi- 
nephrine, norepinephrine, and dopa- 
mine). 

Phe basic problem in the study of the 
ròle of substances that occur in the 
brain lies in the manipulation of their 
activity and in the measurement of 
correlated changes in behavior. Un- 
fortunately, under most circumstances 
these substances cannot simply be in- 
jected into an animal and the conse- 
quent behavioral effects observed. 
There are two general reasons for this. 
First, these substances pass from the 
blood stream into the brain very poorly 
if at all and, second, they are so rapidly 
metabolized that any effects exerted 
would be too rapid to be evaluated by 
most behavioral techniques. Accord- 
ingly, their roles in the control of be- 
havior must be assessed by indirect 
experimental manipulations. 

Acetylcholine (ACH) is especially 
amenable to such indirect manipulation. 
Most of the actions of ACH are spe- 
cifically blocked by anticholinergics like 
scopolamine and atropine. Further, 
ACH is normally metabolized in the 
presence of the enzyme cholinesterase ; 
a block of the action of this enzyme 
with anticholinesterases like eserine 
(physostigmine) delays the destruction 
of the ACH and thereby allows an in- 
tensification of the effects of ACH 
(Goodman & Gilman, 1955). Thus, 
the behavioral effects of decreased or 
increased cholinergic function (i.e., 
nervous activity controlled by ACH) 
may be studied by administering such 
drugs.” 


2The extremely complex effects of at- 
tempts to vary cholinergic activity by genetic 
selection have been discussed by Rosenzweig, 
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These effects are, unfortunately, not 
restricted to the brain. ACH plays an 
important role in the activity of the 
peripheral nervous system. The be- 
havioral effects of these drugs could, 
therefore, merely reflect changes in 
ACH activity at the periphery. There 
are two general ways around this prob- 
lem. First, certain drugs with strong 
anticholinergic or anticholinesterase ac- 
tivity pass from the blood stream into 
the brain very poorly. For example, 
atropine is a potent anticholinergic 
both centrally and peripherally; the 
related drug, methyl atropine, is more 
potent than atropine in several of its 
peripheral actions (Bulbring & Dawes, 
1945; Graham & Lazarus, 1940), but 
has much less activity in the brain (see 
discussions in Carlton & Didamo, 1961 ; 
Paul-David, Riehl, & Unna, 1960). 
Accordingly, the behavioral effects of 
atropine that are not duplicated by 
methyl atropine can most reasonably 
be attributed to atropine’s block of 
ACH in the central nervous system. 

The second device is more circum- 
stantial. Most peripheral actions of 
drugs are likely to result in a general 
disruption of the animal that will be 
reflected in decreased responding. 
Atropine, for example, produces dry- 
ness of the mouth, pupillary dilation, 
changes in cardiac rate, gastrointes- 
tinal relaxation, and so forth. Any of 
these peripheral changes is much more 
likely to decrease than to increase the 
animal’s responding. Thus, increases 
in behavior tend to be more convincing 
demonstrations of a specific central ef- 
fect because there is no simple way to 


Krech, and Bennett (1960). Although certain 
aspects of their work are related to the 
present point of view, these will not be dis- 
cussed here. Similarly, the complicated ef- 
fects of chronic manipulations that may affect 
cholinergic activity will not be elaborated 
upon. Discussions of these effects (Russell, 
1958; Russell, Watson, & Frankenhaeuser, 
1961), should, however, be consulted. 
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determine whether depressed respond- 
ing reflects a peripheral or central 
action, 

The second class of chemicals to be 
discussed is the catecholamines. Of 
these, norepinephrine and epinephrine 
have attracted interest because of the 
rough parallelism of their distribution 
(Vogt, 1954) in areas known to be 
involved in the motivational and re- 
ward functions of the brain (Olds, 
1958). Further, it has been suggested 
(eg. Dell, 1960; Rothballer, 1959) 
that they may be involved in the activ- 
ity of the reticular formation of the 
brain. 

The reticular formation has a variety 
of functions (see especially Magoun, 
1958; and also Samuels, 1959), a par- 
ticularly important one being the con- 
trol of what, in electroencephalographic 
terminology, has been called “arousal.” 
The “aroused” EEG record is charac- 
terized by fast, low voltage activity, 
whereas the “unaroused” record is 
characterized by slow, high voltage ac- 
tivity, When EEG records are taken 
from conscious, unrestrained animals, 
EEG activity and grossly observed, 
qualitative aspects of behavior tend to 
be correlated. The aroused EEG is 
obtained when the animal is awake and 
alert, i.e behaviorally “activated”; 
slow, high voltage activity is recorded 
when the animal is resting, drowsy, or 
asleep (e.g., Bradley & Elkes, 1957)? 


3 This correlation of behavior and EEG 
breaks down following the administration of 
anticholinergics; EEG activity is slowed, 
whereas the animal appears to be either nor- 
mal or “excited” (e.g, Bradley & Elkes, 
1957; Wikler, 1952). Since the usual EEG 
effects of these drugs are obtained in the 
cerveau isolé preparation, it seems likely that 
the anticholinergics act on some system that 
is involved in the mediation of EEG arousal 
and is rostral to the reticular formation (cf. 
Bradley, 1958). This more rostral system 
may inyolve the diffusely projecting thalamic 
system (Jasper, 1961); certain functions of 
this thalamic system may be related to the 
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tivates the EEG and the animal's be- 


their roles must be defined indirectly. 
Amphetamine, which passes into the 
brain, is similar in structure to the 
catecholamines and mimics many of 
their effects, including EEG arousal; 
this last effect appears to require the 
integrity of the reticular formation 
(Bradley & Elkes, 1957). These con- 
siderations have led to the suggestion 
that amphetamine may mimic the ac- 
tions of brain catecholamines (Brodie, 
1959; Rothballer, 1959). Ampheta- 
mine may provide, therefore, a means 
for indirectly increasing activation. 
There is evidence to suggest that 
drugs like reserpine and chlorproma- 
eS ie 


suggest a involvement of the caudate 
nucleus (Buchwald, Wyers, Lauprecht, & 
Heuser, 961) in these mechanisms. 
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zine decrease, rather than increase, 
activation. Reserpine is a drug that 
depletes the stores of several substances, 
including the catecholamines, from 
their normal binding sites in the brain 
(Brodie, Olin, Kuntzman, & Shore, 
1957). This fact has prompted the 
hypothesis that the effects of reserpine 
result from the depletion of the sub- 
stances normally involved in maintain- 
ing activation. This possibility is by 
no means universally accepted. Papers 
by Brodie and Shore (1957), Costa 
(1960), Carlsson, Lindqvist, and Mag- 
nusson (1960), Carlsson (1961), Plet- 
scher, Besendorf, and Gey (1959) 
provide a sample of the pros and cons 
of the debate. For the present, it 
will be assumed that the depression of 
learned behavior seen after reserpine 
administration is related to decreased 
activation due to a depletion of the 
catecholamines. 

Chlorpromazine is another drug that 
produces a marked depression of be- 
havior. This action may be related to 
its block of the effects of the catechol- 
amines (Brodie, 1959). Thus, both 
reserpine and chlorpromazine may pro- 
duce their effects because of their com- 
mon attenuation of the activation 
required for the emission of normal 
learned behavior. 

The role of the catecholamines in the 
control of reticular activity is highly 
conjectural; the possible relation of 
amphetamine, reserpine, and chlorpro- 
mazine to the catecholamines is equally 
so. These conjectures provide, how- 
ever, both a fairly reasonable guess as 
to the physiological significance of the 
effects of these drugs and a tentative 
basis for their use in manipulating 
activation. 


ANTAGONISM OF ACTIVATING AND 
CHOLINERGIC INHIBITORY SYSTEMS 


The point of view to be elaborated 
grew out of a study of the effects of 
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atropine on responding maintained by 
a nondiscriminated shock-avoidance 
contingency (Sidman, 1953). The 
complex effects obtained have been 
described in detail elsewhere (Carlton, 
1962). Briefly, atropine produced a 
disruption of responding that could be 
ascribed to a loss of inhibition; that is, 
responses that rarely if ever occurred 
under normal conditions were re- 
corded following atropine. 

In subsidiary experiments, it was 
found that equimolar doses of methyl 
atropine were either ineffective or 
mimicked the effects of atropine to only 
a minor degree. Although these ef- 
fects might have been due to some 
other action of atropine, it seemed 
reasonable to suppose that they could 
be referred to a block of the action of 
ACH. Since the predominantly 
peripheral block due to methyl atropine 
was essentially without effect, it also 
seemed reasonable to suppose that this 
action was on the brain. Thus, the be- 
havioral effects of atropine could be 
attributed to a block of some cholin- 
ergic inhibitory system in the brain. 

In addition, it could be assumed that 
this inhibitory system antagonized the 
effects of a second system, which nor- 
mally activated behavior. Thus, the 
tendency to respond could be viewed as 
positively related to the level of acti- 
vation characteristic of the animal and 
inversely related to the activity of an 
antagonistic cholinergic system.* 

This notion was to require consider- 
able amplification before it became 
particularly useful; these considera- 


4 The present discussion may also be re- 
lated to a synthesis of (a) points of view that 
treat behavior in terms of stimulus sampling 
(e.g., Estes, 1950), (b) recent neurophysio- 
logical data on habituation (e.g., Hernández- 
Péon & Brust-Carmona, 1961), and (c) the 
hypothesis that shifts in activation result in 
shifts in stimulus threshold (e.g., Dell, 1958). 
A detailed discussion of this aspect of the 
problem is in preparation. 
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tions will be discussed below. The 
basic idea of an antagonism between 
activating and inhibitory systems did, 
however, lead to three general pre- 
dictions that appeared to be supported 
by experimental evidence. 

1. Since an activating and an inhibi- 
tory system were assumed to be recip- 
rocal, manipulations affecting these 
systems should also be reciprocal. In 
particular, increased activation and de- 
creased cholinergic activity should pro- 
duce qualitatively similar effects, and 
conversely, there should be a similarity 
of the effects of decreased activation 
and increased cholinergic activity. 

2. A related expectation was that 
the effects of increased activation should 
be greater when the activity of an an- 
tagonistic cholinergic system was 
reduced. 

3. If activation were reduced to 
near-zero levels, manipulations that in- 
creased activation should reverse the 
effects of this reduction. In contrast, 
manipulations that decreased choliner- 
gic activity should not reverse the ef- 
fects of extreme reductions in activa- 
tion. That is, the effects of reduced 
cholinergic activity were assumed to be 
due only to the loss of an inhibitory 
antagonism to an activating system; if, 
however, activation were extremely 
low, reduced cholinergic activity could 
have no effect. Under these conditions 
there would, therefore, be a breakdown 
in the qualitative similarity discussed 
in the first prediction. 

The various considerations outlined 
earlier, provided the basis for evalu- 
ating these predictions in terms of data 
obtained in several experiments. These 
will be discussed in the remainder of 
this section. 

All the drug experiments to be con- 
sidered used rats or mice as subjects. 
This restriction was imposed because 
there is reason to believe that the meta- 
bolic processes underlying the systems 


to be considered differ markedly from 
species to species (eg, Axelrod, 1954; 
Spector, Shore, & Brodie, 1969). AIl- 
though related effects might be ex- 
pected in experiments in which other 
species are used, the present considera- 
tions do not, strictly speaking, apply 
to such studies. The experiments to 
be discussed have been grouped accord- 
ing to the three predictions mentioned 


previously. 
1. Experiments by Hearst (1959) 
provided the basis for one comparison 
(Carlton, 1961b) of the effects of in- 
creased activation and decreased choli- 
nergic function. In the experiments 
by Carlton, food-deprived rats worked 
in a response chamber that contained 
two levers and a dipper that could 
deliver a small amount of milk, The 
animals were required to alternate 
levers. Having pressed the right lever 
and been reinforced, the animal could 
obtain a subsequent reinforcement only 
by pressing the left lever, and so on. 
Hearst had reported that reductions 
in cholinergic activity with scopolamine 
produced a tendency for the animals 
to repeat rather than to alternate lever 
responses. 

The results of the comparison of de- 
creased cholinergic activity (with 
scopolamine or atropine) and increased 
activation (with amphetamine) are 
shown in Figure 1. Each of the drugs 
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scopolamine, d-amphetamine, or atropine. 
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produced an increase in “persevera- 
tions.” (The number of perseverations 
equals the number of times a response 
to one lever was followed by a second 
response to that same lever. Persever- 
ations are expressed as a percentage of 
the total number of responses.) 

The congruence of the effects of the 
two anticholinergics suggests that 
Hearst’s original finding was, in fact, 
related to a block of the actions of 
ACH rather than to some idiosyncratic 
or unknown action of scopolamine. In- 
creased activation led to qualitatively 
similar effects. (The high potency of 
scopolamine relative to atropine is typi- 
cal and is presumably related to the 
slight difference in the chemical struc- 
tures of the two drugs.) 

In a second experiment (Carlton, 
1961b), atropine and amphetamine were 
compared in terms of their effects on 
behavior maintained by a schedule of 
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reinforcement that differentially rein- 
forced low rates of responding. A 
lever press was reinforced with milk 
only if it was emitted no sooner than 18 
seconds, and no later than 21 seconds, 
after the preceding response. 

Data from a portion of the sessions 
for one animal are shown in Figure 
2. In the upper graph the total num- 
bers of responses in each session have 
been plotted. Saline, atropine, or am- 
phetamine was given before each of the 
sessions. Each drug, but not saline, 
produced an increase in responding. 

Responses were tabulated on a sys- 
tem of counters so that each response 
was categorized in terms of the num- 
ber of seconds that had elapsed since 
the preceding response. Thus, a dis- 
tribution of interresponse times was 
obtained in each session. The medians 
of these distributions have been plotted 
in the lower portion of Figure 2. 

After the animals had been given 
saline, their interresponse times tended 
to approximate the reinforcement con- 
tingencies imposed by the schedule. 
After the injection of atropine or am- 
phetamine, however, they tended to 
respond “too early.” Note that for 
both measures, a greater effect was ob- 
tained with the lower dose of amphet- 
amine. (Discussion of this inversion 
will be deferred to subsequent sec- 
tions.) In general, increased activa- 
tion and decreased cholinergic activity 
had similar effects. 

Reductions in activation produce, of 
course, reductions in behavioral out- 
put. As indicated earlier, meas- 
ures based on a simple depression of 
responding tend to be less useful be- 
cause there is no easy way of dis- 
criminating peripheral effects from ef- 
fects on the brain. One device that has 
been widely used in an effort to meet 
this problem is an avoidance procedure 
in which depression of avoidance re- 
sponding (during a preshock warning 
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signal) is compared with depression of 
escape responding (during the shock). 
The extent to which the depression of 
avoidance responding is greater than 
the depression of escape responding 
can be used as an estimate of the ex- 
tent to which a drug produces a specific 
central effect rather than a simple de- 
bilitation of the animal. (See Herz, 
1960, for a review of the effects of a 
wide variety of drugs on avoidance be- 
havior.) 

At appropriate doses, reserpine and 
chlorpromazine rather specifically de- 
press avoidance behavior. If it is as- 
sumed that these drugs do so because 
they decrease activation, it is reason- 
able to suppose that increased antago- 
nism to activation due to increased 
cholinergic activity would have a simi- 
lar effect. 

Elevations of cholinergic activity 
with eserine have been shown to pro- 
duce a marked inhibition of avoidance 
responding similar to that produced by 
reserpine or chlorpromazine ( Pfeiffer 
& Jenney, 1957). Pfeiffer and Jenney 
also obtained such results with pilo- 
carpine and arecoline, drugs known 
to mimic the effects of ACH (Good- 
man & Gilman, 1955). Further, they 
have provided data that strongly sug- 
gest that these effects were due to an 
action on the central rather than per- 
ipheral nervous system. 

2. If a cholinergic system antagon- 
izes activation, increases in responding 
due to increased activation should be 
greater when cholinergic activity has 
been attenuated. It should be borne in 
mind that, for a particular response 
measure, it is possible to produce such 
slight changes in activation or in 
cholinergic activity that no observable 
effect results. In the experiment to be 
described, such “subthreshold” reduc- 
tions in cholinergic activity (obtained 
by adjusting the dose of atropine for 
each animal) were used throughout, 
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Fic. 3. Avoidance responses per 20 min- 
utes for one animal following d-amphetamine 
plus saline or d-amphetamine plus atropine 
at three d-amphetamine doses. (Each curve 
is the average of three determinations at 
each dose combination. Reprinted by per- 
mission of Williams and Wilkins Company.) 


whereas a wide range of increases in 
activation (with amphetamine) was 
studied. 

The animals had had extensive train- 
ing on an operant shock-avoidance 
schedule (Sidman, 1953). Representa- 
tive data for one rat are shown in Fig- 
ure 3. Each point is the number of 
lever presses emitted in each of the 
successive 20-minute periods of the 
5-hour postdose period (from Carl- 
ton & Didamo, 1961). In each session 
the. animal was given 1 hour of ex- 
posure to the schedule, removed from 
the apparatus, injected (with amphet- 
amine, atropine, or amphetamine plus 
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atropine), and replaced for the re- 
maining 5 hours. The values plotted 
at zero time were taken from the last 
20 minutes of the predose period. 

The upper graph of the figure indi- 
cates that the increase in activation due 
to 0.25 mg/kg amphetamine did not 
increase responding, whereas the ad- 
ministration of this same dose of 
amphetamine with atropine produced a 
marked increase. Moreover, this dose 
of atropine, given alone, did not increase 
responding. Thus, the two drugs, 
each in a “subthreshold” dose, pro- 
duced marked effects if given together. 
Similar augmentations of the effects of 
the higher doses of amphetamine were 
obtained. Note, however, that the 
magnitude of the augmentation (great- 
est at 0.25 mg/kg amphetamine) de- 
clined at higher levels of activation. 
Also, higher levels of activation re- 
sulted in an increase in responding that 
was less than the increase obtained at 
lower levels (cf. Figure 2). 

In subsidiary experiments, equimolar 
doses of methyl atropine produced an 
augmentation much less than that pro- 
duced by atropine. For reasons dis- 
cussed previously, this finding suggests 
that the interaction of atropine and 
amphetamine was due to attenuation 
of cholinergic function in the central 
rather than peripheral nervous system. 

Related effects on gross motor ac- 
tivity have been reported by Tripod 
(1957) and by Frommel and Fleury 
(1960). Further, other anticholiner- 
gics, scopolamine among them (Carl- 
ton, 196la and unpublished experi- 
ments), have been found to produce a 
similar accentuation of the effects of 
amphetamine on avoidance responding. 

3. As indicated previously, am- 
phetamine, because it may mimic the 
action of the catecholamines, should re- 
verse the effects of a drug that pro- 
duces a reduction in activation. At- 
tenuation of inhibitory cholinergic ac- 
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tivity, on the other hand, acts only to 
“release” the effects of whatever level 
of activation characterizes the animal 
when cholinergic activity is reduced. 
Thus, the effects of amphetamine and 
the anticholinergics should be similar 
in the normal animal but different 
when activation is low. Their similar- 
ity has already been demonstrated. 
Two experiments demonstrate their 
difference. 

In the first of these (Stein & Seifter, 
1960), animals trained on a variable- 
interval schedule for water reinforce- 
ment were given repeated injections of 
reserpine. The animals’ response rates 
became very low during the reserpine 
series, presumably because of the ef- 
fects of a depletion of catecholamines. 
This interpretation is strongly sup- 
ported by data obtained by Stein and 
Ray (1960a). The general results of 
their study have been replicated in 
unpublished experiments. 

Stein and Seifter found that low 
doses of amphetamine promptly and 
markedly increased the low rates seen 
after reserpine, whereas atropine and 
scopolamine, in a wide range of doses, 
produced only slight increases in 
responding. 

Similar effects have been reported by 
Tripod, Bein, and Meier (1954) in 
experiments in which the motor activ- 
ity of mice was measured. Both am- 
phetamine and scopolamine were found 
to produce increases in activity. The 
amphetamine and scopolamine. doses 
were selected so as to produce equal 
degrees of increased activity. Tripod 
et al. then gave increasing amounts of 
reserpine to determine the dose re- 
quired to antagonize the increased 
motor activity due to the drugs. Rela- 
tively low doses of reserpine attenuated 
the increase in activity due to scopola- 
mine to 50% of the value that was ob- 
tained when reserpine was not given; 
none of the doses of reserpine up to 
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extremely high levels produced a com- 
parable reduction of the increased ac- 
tivity recorded after amphetamine. 
Thus, the increase in activation due to 
amphetamine could not be reversed by 
depletion of the catecholamines, pre- 
sumably because amphetamine literally 
mimicked their normal effect. In con- 
trast, attenuation of cholinergic activ- 
ity, normally as effective as the dose of 
amphetamine, was readily reversed by 
the reduction in activation due to 
reserpine. 


CHOLINERGIC Activity, ACTIVATION 
AND THE Errects OF NON- 
REINFORCEMENT 


The experiments discussed in the 
preceding section tended to support the 
notion of an antagonism to activation 
by a cholinergic inhibitory system. 
They posed, however, an additional 
question: Why did the manipulations 
produce the particular effects on oper- 
ant behavior that they did? The 
various response measures were used 
because they appeared to be sensitive 
to the action of certain drugs that could 
be supposed to produce changes in ac- 
tivation or cholinergic activity. The 
analysis proceeded at a crude correla- 
tional level without regard to the 
mechanisms that might underlie the 
particular effects observed ; it was 
largely a matter of finding an effect of 
one drug and determining whether a 
second one would produce the expected 
result. 

A possible answer to the question 
was provided by a reconsideration of 
the effects of atropine, mentioned on 
page 22, and of other results ob- 
tained in the study by Hearst (1959). 
Details of this aspect of his experiment 
will be discussed subsequently ; for the 
present it need only be noted that 
Hearst reported data suggesting a 
block of extinction by scopolamine. 

In the atropine study it was noted 


that this anticholinergic led to the 
emission of behavior that hardly ever 
occurred normally, These normally 
inhibited responses were, in general, 
extraneous to optimizing shock-avoid- 
ance. In the light of Hearst’s data, 
they could be viewed as being those 
that were normally inhibited because 
they were nonreinforced. 

It thus appeared that some choliner- 
gic system antagonized the diffuse ef- 
fects of activation and that this antago- 
nism, first, might provide a basis for 
“selection” of the effects of activation 
and, second, was related to the extent 
that certain responses were correlated 
with nonreinforcement. Thus, level 
of activation could be viewed as con- 
trolling the tendency for all responses 
to occur, whereas an inhibitory choli- 
nergic system would act to antagonize 
this action on nonreinforced responses. 
The net result of this interaction would 
be that changes in activation would 
result in changes in the likelihood of 
occurrence of only a few responses, 
those that were reinforced. 

A cholinergic system thus appeared 
to be involved in the necessary and 
obviously adaptive “steering” function 
assumed to be lacking in activation. 
One simple, although rather artificial, 
way of viewing this function is to as- 
sume that (a) many responses are 
available to the animal in any situation 
and that (b) the likelihood of any one 
response occurring is directly related 
to the extent to which the strength of 
that one response is greater than the 
strength of the others. One variable 
affecting response strength is the de- 
gree to which a particular response has 
been reinforced. For descriptive sim- 
plicity, all responses could be assumed 
to be ordered on the abscissa of a 
graph, with response strength due to 
reinforcement plotted on the ordinate 
and a line with negative slope taken as 
describing the response strengths of the 
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various responses. A plot of this kind 
assumes that some responses, although 
not explicitly reinforced, do increase 
in strength because of generalization 
of reinforcement (see Reynolds, 1961). 

Decreased cholinergic activity could 
then be described as an increase in 
slope (i.e., a shift to a less negative 
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value), with the y intercept, describing 
the strength of the dominant response, 
remaining constant; decreased cholin- 
ergic activity would “flatten” the gradi- 
ent describing the strengths of the 
various responses. Such a shift would 
diminish the differences in strength 
between the dominant and other re- 
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Fie. 4. A. Schematized representation of the effects of reduced cholinergic activity (at the 
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sponses. This diminution would de- 
crease the likelihood of the dominant 
response occurring while increasing the 
likelihood of intrusion of initially non- 
dominant ones. Further, there would 
be an ordering of this effect such that 
responses having relatively greater 
strength under normal conditions 
would be more likely to intrude under 
conditions of decreased cholinergic 
activity. 

These relations have been schema- 
tized at the left of Figure 4A. Two 
levels of cholinergic activity, A and B, 
are shown. The differences between 
the dominant response, I on the 
abscissa, and a second response, II, are 
shown by a vertical arrow for Line 
A and by the vertical dashed arrow for 
Line B. It is apparent that the relative 
strength of I decreases with reduced 
cholinergic activity (the shift from A 
to B) and that the likelihood of an in- 
trusion of II, under Condition B, would 
be greater than that of some other 
response having an initially lower 
strength (e.g. II). 

The results of the alternation ex- 
periment summarized in Figure 1 bear 
on these relations. Since responses to 
both levers were extensively reinforced, 
it seems reasonable to suppose that, 
following the occurrence of one re- 
sponse, the most likely response would 
be to the opposite lever, whereas the 
next most likely one would be repeti- 
tion of the response just made. Thus, 
decreased cholinergic inhibition would 
tend to shift the level of alternation 
toward chance (50% in Figure 1). 

Similar effects were obtained in the 
experiment summarized in Figure 2. 
It may be supposed that a major com- 
ponent in the temporal discrimination 
that had developed under normal condi- 
tions was the inhibition of the tendency 
to respond “too early” since such re- 
sponding was nonreinforced. Accord- 
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ingly, attenuation of cholinergic activity 
would result in the intrusion of such 
carly responses (ie, median inter- 
response times would be less, as in fact 
they were). At low doses, there was 
also a relatively greater tendency for 
the probabilities of occurrence of re- 
sponses having interresponse times 
closer to the 18-21 second category to 
be increased. Related effects of at- 
tenuated cholinergic activity on be- 
havior maintained by a fixed-interval 
schedule of reinforcement have been 
reported by Herrnstein (1958), Brady 
(1959), and Boren and Navarro 
(1959). Boren (1962) has also re- 
ported that methyl scopolamine, which, 
like methyl atropine, passes into the 
brain very poorly, is much less potent 
than scopolamine in producing these 
effects. 

A somewhat more direct index of the 
effects of decreased cholinergic activity 
was obtained in an experiment in which 
the frequency of several similar re- 
sponses could be measured. Food- 
deprived rats were trained to emit a 
two-membered response-chain. The 
first member required the animal to 
touch any one of 12 metal buttons 
with its nose. (Responses to each 
button were recorded by a low-amp 
contact relay.) The buttons were 
mounted in a horizontal row in the wall 
at one end of the response chamber. 
Contact with any one of the buttons 
switched stimuli. In the second stimu- 
lus condition, the animal was rein- 
forced for crossing the chamber and 
pressing the lever mounted in the op- 
posite wall. The first stimulus condi- 
tion was reinstated coincident with 
reinforcement. The terminal response 
sequence was: touching any button, 
crossing the chamber and lever press- 
ing, consuming the reinforcement, re- 
crossing and touching a button, and 
so on. 
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All animals developed consistent 
position “preferences.” * Representa- 
tive distributions of responses to the 
12 buttons for two animals are 
shown at the left of Figure 4B; the 
effects of scopolamine are shown at the 
right. Under normal conditions, only 
a small proportion of the available re- 
sponses was actually emitted; the ani- 
mal “homed in” on a few of the 12 
buttons. However, responses more 
like the normally dominant one were 
more likely to occur under conditions 
of decreased cholinergic activity. Fur- 
ther, scopolamine did not decrease the 
total output of behavior ; total respond- 
ing was, in fact, increased. The in- 
creased “spread” of responding was 
also noted with amphetamine. 

These data suggest, incidentally, that 
the effects seen in the alternation sit- 
uation did not represent a reversion to 
a simple position “preference.” As 
the data in Figure 4B show, scopola- 
mine resulted in a shift away from 
normally “preferred” responses. 

The effects of activation have not 
been explicitly considered in this sec- 
tion. One way of treating these effects 
is in terms of a model like that ad- 
vanced by Broen and Storms (1961). 
Again, the likelihood of a response 
occurring may be taken as directly re- 
lated to its relative strength. Further, 
it may be assumed that the strength of 
any response has some maximum. 
Thus, in the graph at the right of 
Figure 4A, activation (on the abscissa 
is taken as increasing the strength of 
all responses (only three, I, II, and 
III, are shown) to a maximum (M). 

The figure indicates that increased 


5 The development of “preferences” was 
evidently related to uncontrolled contingen- 
cies of adventitious reinforcement (i.e., it was 
“superstitious”). These animals had been 
trained for other purposes (cf. Carlton, in 
press); it would have been better, for the 
present discussion, if responses to all but one 
button had been explicitly nonreinforced. 
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activation, to A on the abscissa, will 
increase the relative strength of I, but 
that once I has reached a maximum (at 
A), this difference will be diminished. 
Further, the likelihood of intrusion of 
II upon I will, at levels of activation 
in excess of A, be greater than that of 
III. The difference in the slopes of 
Lines I, II, and III can, of course, be 
taken as being determined by a variety 
of factors operative in any experimental 
situation, the extent of reinforcement 
of each response being one of them. 

Since the level of activation in all 
of the experiments previously discussed 
may be assumed to have been quite 
high under normal conditions, it is 
reasonable to suppose that further in- 
creases would result in a shift to the 
right of A in Figure 4A. Thus, de- 
creased cholinergic activity and in- 
creased activation should have pro- 
duced, as they did, qualitatively similar 
effects. Further, concurrent manipula- 
tions of this kind would be expected 
to augment one another (see Figure 3). 

There are three addenda to these 
considerations that should be empha- 
sized. First, the use of the graphs in 
Figure 4A should be taken as con- 
tributing neither mathematical rigor 
nor nicety to the present discussion. 
If taken too seriously, these schematized 
relations may lead to gross inconsis- 
tencies; this possibility has not been 
critically examined, They were used 
only as maximally simple and there- 
fore crude diagrams that would, hope- 
fully, help to clarify the kinds of 
relations discussed. 

A second point is that these consid- 
erations amount to little more than an 
analysis in terms of a loss of the control 
of behavior normally exerted by two 
of the many variables involved in any 
reinforcement situation. As such, the 
present analysis has a formal similarity 
to others that have used the loss of 
“stimulus control” as an organizing 
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construct in the discussion of drug 
effects. The general utility of this 
kind of analysis is best illustrated in a 
recent discussion by Dews and Morse 
(1961). 

Finally, it should be noted that al- 
though manipulations affecting activa- 
tion and those controlling level of 
deprivation cannot be  uncritically 
equated, there is some basis for the use 
of a model like that proposed by 
Broen and Storms in their analysis of 
the effects of deprivation. Evidence 
suggesting that changes in deprivation 
result in changes in reticular activity 
and thereby in activation has been dis- 
cussed by Dell (1958). Further, the 
EEG signs of activation have been 
found to be a consequence of increased 
deprivation (Steiner, 1960). Broen 
and Storms also discuss a number of 
experiments in which extreme increases 
in deprivation (see also Malmo, 1959) 
resulted in a decline in responding. 
Related effects may be consistently 
noted with extreme increases in activa- 
tion or decreases in cholinergic ac- 
tivity. These may reflect the intrusion 
of extremely remote responses that are 
incompatible with the normal chain of 
responding (see, however, the last sec- 
tion), A pertinent finding, reported 
by Boff and Heise (1961), is that 
whereas low doses of amphetamine, 
atropine, or scopolamine increased 
avoidance responding, higher doses re- 
sulted in a decline (cf. Figures 2 and 
3). More to the point, they found 
that the drug-induced decline in re- 
sponding tended to occur at lower 
doses when a larger response chamber 
was used, Thus, the effects of ex- 
treme increases in activation or de- 
creases in cholinergic activity appear 
to have been partly determined by the 
opportunity for the intrusion of re- 
sponses other than those involved in 
lever pressing. 


CHOLINERGIC MEDIATION OF THE 
EFFECTS OF NONREINFORCEMENT 


The preceding considerations place 
particular emphasis on the involvement 
of a cholinergic system in the mediation 
of the effects of nonreinforcement. 
Several experiments bear directly on 
this notion, 

The first of these, based on a tech- 
nique described by Stein and Ray 
(1960b), involved self-stimulation of 
reward areas of the brain. The pro- 
cedure was adopted because it seemed 
to be an extremely sensitive and precise 
means of assessing the effects of non- 
reinforcement. 

Animals with electrodes implanted 
in the preoptic area were trained to 
self-stimulate (by lever pressing). 
After every eleventh response, the cur- 
rent was decreased by one step. The 
progressively reduced current resulted 
in a decline in responding to a point 
at which fewer than 11 responses 
occurred in 30 seconds. At this 
point the current was increased by 
one step and the animal was given 
one response-independent stimulation, 
This “free” stimulation was used to 
signal the increase in current. The 
procedure thus allowed the animal to 
“adjust” the stimulating current 
around that low level that just main- 
tained responding. The value at which 
the current was increased by one step 
provided an estimate of the upper 
limit (in milliamperes) of the range 
of nonreinforcing current-levels. 

Response data were treated in terms 
of the median estimate in each of the 
successive 10-minute periods of the 
session. The animals were allowed 
two such periods, were injected, and 
were then given seven additional 
periods. Representative data from two 
sessions for one animal have been pre- 
sented in Figure 5. Under saline condi- 
tions, responding stopped at a current 
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level that either was constant or rose 
slightly throughout the session; the 
effect of scopolamine was a marked and 
reversible decrease in the current re- 
quired to maintain responding. Thus, 
responses that were normally inhibited 
because of their correlation with non- 
reinforcement were emitted under 
conditions of attenuated cholinergic 
function. 

In a second set of experiments 
(Brady, 1959; Herrnstein, 1958), in- 
creased responding during the extinc- 
tion component of a multiple-schedule 
was reported to be a consequence of 
scopolamine administration. 

In the previously mentioned study by 
Hearst (1959), animals were trained 
to “wait,” not respond, for a given 
period, after which one of two auditory 
stimuli was presented. Reinforcement 
was delivered to the animal only if it 
pressed a particular lever of the two 
available when one stimulus was on and 
pressed the other when the other stim- 
ulus was on. 

The animals normally responded ap- 
propriately to the levers during stimu- 
lus periods and emitted few responses 
between them. When given scopola- 
mine, however, they emitted many re- 
sponses between periods and tended 
to perseverate in their responding to 
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one lever, regardless of which stimulus 
was on (cf. Figure 1). 

The animals were subsequently given 
a series of extinction sessions, during 
which responding declined. The ani- 
mals were continued on extinction but 
were then given injections of scopola- 
mine before each session. Hearst 
found that (a) levels of responding 
returned to those obtained under the 
drug before extinction, (b) this be- 
havior was also characterized by a 
tendency to perseverate and to re- 
spond between stimulus periods, and 
(c) continued extinction under scopol- 
amine (for thousands of nonreinforced 
responses) failed to result in a decline 
in performance. He also reported that 
when the scopolamine injections were 
discontinued, performance dropped to 
the low levels that had been obtained 
before scopolamine injections were 
begun. 

The importance of Hearst’s findings 
prompted an extension of his experi- 
ment, Rats that had had extensive 
exposure to a Sidman-type avoidance 
schedule were assigned to one of three 
groups. One of these received saline 
before each extinction session, the sec- 
ond received scopolamine (0.6 mg/kg), 
and the third received amphetamine 
(1.0 mg/kg). The animals were given 
90-minute sessions either weekly or 
biweekly. In the first session, the 
avoidance schedule was in effect for 
the initial 30 minutes (Session 1-A). 
The animals were then injected and 
the shock circuit was disconnected for 
the remaining hour (Session 1-B). 
Conditions in all subsequent sessions 
were identical to those in 1-B. 

As expected, the rates of the animals 
given amphetamine increased markedly 
in Session 1-B. To correct for this 
increase, i.e. to equate as much as 
possible the effects of the injections on 
performance, the subsequent data for 
each animal were expressed as a per- 
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centage of the number of responses 
made in Session 1-B. (The numbers 
of responses emitted in this first ex- 
tinction session, as a percentage of the 
numbers emitted in the control session 
preceding Session 1, were 81, 111, and 
278% for the saline, scopolamine, and 
amphetamine groups, respectively. In 
this control session the average rates 
of responding for the three groups were 
9.6, 9.8, and 11.2 responses per minute, 
respectively.) 

The averaged percentages have been 
entered in Figure 6. The saline ani- 
mals extinguished rapidly ; the amphet- 
amine animals extinguished more 
slowly; and the scopolamine group 
showed an initial decline in perform- 
ance, followed by a rise to a level equal 
to that obtained in the first session. 
The values obtained from Festinger’s 
d test for each session have been en- 
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tered at the top of the figure. In the 
terminal sessions, the amphetamine and 
saline groups did not differ signifi- 
cantly, whereas the responding of the 
scopolamine group was significantly 
higher than both of these. 

Although sessions were 1 or 2 
weeks apart, the decline in responding 
of the amphetamine group could have 
been due to the development of toler- 
ance to the drug. As a check on this 
possibility, these animals were not 
given sessions for at least 1 month 
after the completion of the main experi- 
ment. When the effects of amphet- 
amine were again evaluated in the 
avoidance situation, the rates of re- 
sponding were essentially the same as 
those that had been recorded in the 
terminal sessions of the main experi- 
ment. 

These data bear directly on the 
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mechanisms previously discussed. The 
tendency for amphetamine to increase 
responding may be taken as being 
related to an increase in activation. 
Cholinergic activity, on the other hand, 
is presumed to be involved in the medi- 
ation of the effects of nonreinforce- 
ment. Thus, although increased acti- 
vation and decreased cholinergic 
activity have similar effects on per- 
formance, they should have different 
effects in an extinction situation. 

It should be borne in mind that the 
data in Figure 6 are averages. Al- 
though the responding of the individual 
animals in the scopolamine group 
tended to be maintained, they were in 
no sense responding “as if” extinc- 
tion had not been begun at all; i.e., they 
did not respond at the low, stable rates 
characteristic of normal performance. 
One obvious and relevant complica- 
tion is that the doses of scopolamine 
used would have severely disrupted the 
responding of animals in a performance 
situation; the use of percentages in 
Figure 6 only very partially corrects 
for this. 

A similar complication was inherent 
in the data obtained by Hearst. His 
animals continued to respond in extinc- 
tion but they were not responding as 
if they had not been given scopolamine 
and reinforcements had not been dis- 
continued. Rather, the general charac- 
ter of responding was much like that 
observed when the animals had been 
given scopolamine while reinforcements 
were being delivered. In general, then, 
the effects of attenuated cholinergic 
activity appear to reduce the normal ef- 
fects of nonreinforcement and, as such, 
produce a variety of effects like those 
discussed previously. In addition 
to these effects, there is a block of the 
decline in overall responding typically 
observed during extinction. 

It should also be borne in mind that 
the shift from reinforcement to non- 


CARLTON 


reinforcement probably produces a 
number of disruptive effects not related 
to the role of cholinergic functioning 
as it is being considered here. Thus, 
the measures of the effects of nonrein- 
forcement used in Hearst’s study and 
in the one summarized in Figure 6 
were rather “impure” (see also the 
final section). The data obtained in 
these two experiments do, nonetheless, 
provide a very preliminary basis for 
supposing that cholinergic activity is 
involved in the mediation of the ef- 
fects of directly imposed nonreinforce- 
ment. What is needed, clearly, is a 
relatively less contaminated measure 
that may meet this need are currently 
under study. 

If a cholinergic system does play an 
important role in the extinction process, 
and to the extent that the extinction of 
irrelevant, competing responses is in- 
volved in the acquisition of a particu- 
lar pattern of behavior, it is to be 
expected that variations in cholinergic 
activity should be reflected in corre- 
sponding changes in acquisition. Two 
experiments relate to this point. 

One of these (Carlton, 1961b) in- 
volved a two-membered response-chain 
like that in the experiment summarized 
in Figure 4. Rats were required to 
displace any one of five plastic plates 
(response “keys’’), cross the response 
chamber, and depress the lever mounted 
in the opposite wall. The circuit was 
arranged so that (a) a lever depression 
was required after each key response, 
and (b) only after they had displaced 
the one correct key were they rein- 
forced for depressing the lever. Thus, 
the animals were required to traverse 
the chamber repeatedly and to learn, on 
a trial and error basis, which of the 
five keys was correct. After 20 rein- 
forcements on one key, another was 
made the correct one. After 20 rein- 
forcements on that one, still another 
was correct, and so on. All five 
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keys were correct in each session; the 
order in which they were correct was 
randomized daily. 

The effects of scopolamine, atropine, 
and amphetamine are shown in Figure 
7. The averaged numbers of incorrect 
responses have been entered in the 
figure. All three drugs increased the 
number of errors the animals made in 
the course of obtaining 20 reinforce- 
ments on each key. These effects can 
be related to the increased probability 
of intrusion of incorrect responses due 
to an increase in activation with am- 
phetamine, on the one hand, and to an 
attenuation of the usual effects of non- 
reinforcement with the anticholinergics, 
on the other. 

This experimental procedure places, 
of course, exclusive weight on the 
speed with which the animal switches 
from one key response when it is no 
longer reinforced. Rather similar ef- 
fects have, however, been reported in a 
study in which a T maze, a more tradi- 
tional device for the study of learning, 
was used (Whitehouse, 1959). 

It is reasonable to suppose that learn- 
ing to make a correct “choice” in a 
T maze involves, to some extent, the 
extinction of the tendency to make the 
wrong one, In the study by White- 
house, it was found that reduction in 
cholinergic activity with atropine sig- 
nificantly decreased the rate at which 
rats learned a discrimination problem 
in the maze. Whitehouse also obtained 
data indicating that rate of learning 
was increased when cholinergic activ- 
ity was increased with eserine. This 
difference was not statistically reliable. 

In a preliminary study, Whitehouse 
had, however, found reliably faster 
learning by animals given eserine. The 
difficulty of the discrimination problem 
in this study was greater than that in 
the main experiment. It is to be ex- 
pected that, because of the greater 
likelihood of intrusion of incorrect re- 
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sponses, the learning of a more diffi- 
cult problem would be more sensitive 
to the effects of increased cholinergic 
activity. As Whitehouse points out, 
the results of this study are in question 
because precautions were not taken 
against the deterioration of eserine be- 
fore injection. However, the decline 
in the drug’s potency is not great dur- 
ing the initial phase of deterioration. 
Thus, it may have been that the im- 
proved learning of the more difficult 
problem was related to increased choli- 
nergic activity at some indeterminant 
level. It is unfortunate that this prob- 
lem, rather than the simpler one, was 
not used in the main experiment. 


GENERAL CONSIDERATIONS 


The preceding discussion was not 
aimed at the pharmacological problem 
of uniquely characterizing the actions 
of certain drugs. The problem was, 
rather, to use certain judiciously se- 
lected drugs on the assumption that 
their effects would provide some insight 
into factors involved in the control of 
normal behavior by the brain. 

The relation between drug action 
and behavioral effect is not, at our 
present state of knowledge, a sym- 
metrical one: a communality of be- 
havioral effect does not necessarily 
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imply a communality of the mechanism 
of action of the drugs. Thus, the fact 
that a drug produces a behavioral effect 
similar to that of scopolamine, for ex- 
ample, indicates only that scopolamine 
has not been uniquely characterized ; 
this fact, however, has little bearing on 
the possible role of ACH in the brain. 
In contrast, the fact that it is fairly 
reasonable to assume that the action 
of scopolamine is to block the action 
of ACH suggests that the behavioral 
effects of scopolamine reflect the role 
of some cholinergic system involved in 
the control of behavior. 

In contrast to the reasonable assur- 
ance with which certain drugs may be 
assumed to affect ACH activity, the 
mechanisms of action of those drugs 
presumed to affect activation are any- 
thing but clear. Because their action 
has not been worked out in any detail, 
interpretations of their relation to the 
catecholamines, to the reticular for- 
mation, and to activation must be 
viewed with extreme caution. Their 
use has, nonetheless, shed some light 
on the possible properties and functions 
of a cholinergic system in the brain. 

Another complication is that neither 
the pharmacological manipulations nor 
the behavior indices discussed are 
“pure”; both are contaminated by fac- 
tors not explicitly considered here. 
The previously mentioned lack of sym- 
metry in the drug-behavior relation is 
due to such contamination; a change 
in behavior may be produced by any 
one of several variables and any one 
drug may have several actions. Two 
of many possible examples illustrate 
the point. Amphetamine has been as- 
sumed to increase activation; among 
other actions, it also produces anorexia 
and may accentuate “fear-like” vari- 
ables that control behavior. (Data 
presented by Brady, 1957, and Geller & 
Seifter, 1960, can be interpreted as a 
reflection of this action.) 


In a food-reinforcement situation, 
the anorexigenic action of amphetamine 
may tend to counteract its activating 
effect. In Figure 2, the inversion of 
amphetamine-effect may have been due 
either to anorexia or to an increase in 
activation to such an extreme degree 
that responses incompatible with lever 
pressing occurred and therefore event- 
uated in a decline in responding (see 
p. 30). In support of the second 
possibility, a similar inversion of effect, 
which cannot be assigned to anorexia, 
has been noted in avoidance experi- 
ments. There is, however, no simple 
way to evaluate the relative contribu- 
tion of these two factors. 

As a second illustration, it may be 
noted that amphetamine produces in- 
creases in avoidance responding far 
greater than those obtainable with 
scopolamine. In contrast, scopolamine 
was found to be much more potent 
than amphetamine in affecting several 
positively reinforced behaviors. Is this 
differential action due to a “fear” com- 
ponent in the action of amphetamine 
that makes avoidance an especially sen- 
sitive indicator, or to an anorexigenic 
action not shared by scopolamine, or to 
both? Again, no simple answer is 
available. 

One point that comes out of this un- 
certainty is that any interpretation of 
the effects of a drug must consider all 
of its actions. A second point is that 
an interpretation of the significance of 
a drug effect that is related to only one 
of the drug’s possible actions cannot 
be expected to account for all of the 
effects obtainable with that drug. 

The upshot of all this is that there 
can be no experimentum crucis in so 
complex an area. The likelihood of a 
point of view being correct can be 
assessed only in terms of the total 
weight of indirect evidence in its favor. 
The results of any single experiment 
may be unique to the particular param- 
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eters employed or may be due to a 
wholly idiosyncratic and largely irrele- 
vant action of the drug. This comes 
down to saying that what is required 
are evaluations of selected drugs in a 
reasonably wide range of doses in a 
variety of behavioral situations (cf. 
Dews & Morse, 1961; Miller, 1956). 
Only with such information can one 
hope to undertake analyses aimed at 
abstracting some homogeneity of effect 
from the heterogeneity of drug action. 
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It is proposed that innate 
behavior, 
species including humans, 
retinal ganglion cells. A 


visual perceptual and preferential choice 
forming the basis for release of fixed action patterns in many 
is organized and integrated neurally by the 
neural mechanism based on the differential 


sensitivity of these cells to specific aspects of stimulation, with cells 
exhibiting a differential rate of maturation paralleling the maturation 
It is hypothesized that this mecha- 


of certain behavior, is developed. 


nism is responsible for onset of the imprinting critical 
innate object recognition, and the stimu- 
involved 


preferential choice responses, 
lus-specific releasing function 


period, inborn 


in fixed action patterns. 


Evidence supporting the neural model is presented, and specific 
examples of relevant behavior are 


Recent developments in ethology and 
comparative psychology point to the 
existence of a large repertoire of com- 
plex, highly organized innate behavior 
patterns in a wide phylogenetic range 
of species (e.g., Thorpe, 1956; Thorpe 
& Zangwill, 1961; Tinbergen, 1951). 
Closely related to these findings 
are phenomena involved in critical 
period learning (Hess, 1959; Jaynes, 
1957; Moltz, 1960), and in the ma- 
turation and development of prefer- 
ential responses to visual stimulation 
(Fantz, 1957, 1958, 1961). Al- 
though many writers in psychology 
acknowledge the existence of such un- 
learned, developmental, and critical 
period phenomena in their theories, in 
practice they rely almost entirely on 
learned modification of response pat- 
terns in explaining and predicting be- 
havior. Thus, basic innate, perceptual, 
and maturational aspects of behavior 


1 This paper was written while the author 
held U. S. Public Health Service Predoctoral 
Fellowship MF-10,710. Sincere appreciation 
is expressed to O. T. Law for advice and 
criticism during the development and prepa- 
ration of ideas presented in this paper, and 
to USPHS grant M-5207 for partial support. 
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explained in terms of the model. 


seem to be largely ignored, particularly 
in explanations of human behavior. 
Further, the problem of exact identifi- 
cation and specification of the neuro- 
physiological structures and functions 
serving to organize and integrate com- 
plex unlearned behaviors has received 
almost no attention from a psychologi- 
cal point of view. 

This paper will attempt to integrate 
examples of unlearned behavior, criti- 
cal period learning, and maturation of 
preferential choice and discriminatory 
responses with neurophysiological evi- 
dence for a neural mechanism concern- 
ing development and mediation of such 
behavior. The major purpose of the 
paper is to suggest that a theory based 
on a peripheral mechanism for neural 
encoding and transmitting of informa- 
tion from specific environmental stimu- 
lus attributes helps explain the behavior 
referred to above. 


Frxep Action PATTERNS 
IN BEHAVIOR 


The phenomena to be discussed be- 
low constitute a class of behavior re- 
ferred to as Fixed Action Patterns 
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(FAP). As employed here, the gen- 
eral meaning of the term is similar to 
the concept of instinct, but without the 
usual teleological or motivational im- 
plications of the latter. In Ethology, 
the concept of FAP is defined as “rigid 
and stereotyped actions [that are] often 
the end-point or climax of a series of 
instinctive actions the earlier elements 
of which might be highly flexible” 
(Thorpe & Zangwill, 1961, p. 89). 
Thus, in Ethological usage the FAP is 
all but equated with the concept of a 
species-specific consummatory response 
usually serving to terminate a segment 
of ongoing behavior. 

However, the meaning of FAP in 
the present paper will be expanded to 
include not only species-specific con- 
summatory behavior, but also the type 
of relationship conceptualized in learn- 
ing theory as an sUr. Thus, in its 
broadest sense, the FAP concept here 
refers to any unconditioned response 
elicited by an unconditioned stimulus. 
In particular, the paper deals with 
visual stimuli serving to evoke species- 
specific behaviors that are relatively 
impervious to the effects of training. 


Consummatory Behavior in the Toad 


Anecdotal and experimental evidence 
suggests that the toad and the frog will 
strike at, capture, and ingest only a 
moving, relatively small, stimulus ob- 
ject. Thus, a nonmoving object ap- 
pears to be behaviorally meaningless 
for the toad and frog with respect to 
consummatory responses. That move- 
ment per se, is a necessary condition 
for evoking this response was demon- 
strated by the finding that operations 
known to produce visual illusions of 
movement in humans also serve to 
elicit the striking-capture-ingestion con- 
summatory sequence in the toad (Kaess 
& Kaess, 1960). If both the toad and 
a food object are stationary but the 


background environment moves, Or if 
the toad and a food stimulus object are 
moving together at a constant velocity 
in a stationary environment, the toad 
performs the consummatory response. 
This consummatory response appar- 
ently does not occur, under any depri- 
vation schedule, unless the food stimu- 
lus actually moves or under conditions 
of illusory movement; but, given these 
necessary conditions, the response 
automatically occurs. 


Food-Begging Behavior in Herring 
Gull Chicks 


In the Herring Gull, newly hatched 
chicks get food by pecking at the tip 
of the parent’s bill. This response 
leads the parent to present food to the 
chick, The bill of the parent Gull is 
yellow, with a red spot at the lower 
end. By the appropriate use of models, 
Tinbergen (1951) compared the chick’s 
response to a cardboard dummy in nat- 
ural colors to the response to a dummy 
lacking the red spot. The presence of 
the red spot was critical for the release 
of the begging response. In further 
tests it was found that a model having 
a spot of any color results in a greater 
number of begging responses than does 
a model with no spot. These results 
lead to the conclusion that contrast 
between bill color and spot is a critical 
factor in eliciting the begging response. 
Further, the finding that red has more 
“releasing value” than any other color 
including black, indicated that red, as 
a color, had an especially potent in- 
fluence on the chick’s behavior. A 
second experiment (Tinbergen & Per- 
deck, 1950) indicated that neither color 
of the bill nor color of the head had 
any value as a releasing stimulus. 
Thus, responsiveness to visual contrast, 
especially contrast involving red on a 
yellow background, appears to be in- 
born in young Herring Gull chicks, and 
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serves as the necessary stimulus condi- 
tion for elicitation of the begging 
response. 


Supernormal Releasing Stimuli 


A stimulus which is even more effec- 
tive than the natural stimulus in evok- 
ing a FAP is called a supernormal re- 
leaser (Tinbergen, 1951). Examples 
of such stimuli are found in the egg 
nesting and egg rolling behavior of 
some avian species. An instance of a 
supernormal releaser is the “egg recog- 
nition” or egg preference behavior of 
the ringed plover (Koehler & Zagarus, 
in Tinbergen, 1951, p. 44). When pre- 
sented with a choice between a normal 
egg, which is light brown with dark 
spots, and an egg with a clear white 
background and black dots, the birds 
clearly preferred the latter, even though 
it differs markedly from the stimulus 
attributes of the egg as found in the 
natural environment. Similarly, Oys- 
ter Catchers prefer a clutch of five eggs 
to the normal clutch of three. If pre- 
sented with a normal sized egg and an 
egg approximately four times the size 
of the normal egg, the Oyster Catcher 
prefers the larger egg (Tinbergen, 
1951). 

In the case of the plover, visual con- 
trast, in the form of clearly distinctive 
black dots on a white ground, seems to 
be a major factor in eliciting preferen- 
tial egg setting responses. In the case 
of the Oyster Catcher, physical size 
(i.e., greater periphery) of the stimulus 
object appears to be a major determi- 
nant of egg preference. 


Development of Form and Pattern 
Preferences in Human Babies and 
Chicks 


Examples of the maturation of visual 
preference behavior, here considered to 
be FAPs, in human babies, chicks, and 
chimpanzees are found in several 
studies by Fantz (1957, 1958, 1961). 


Among other things, these studies indi- 
cate that form and pattern perception 
are inborn in these species, depending 
mainly on maturation and visual ex- 
perience at certain critical ages for 
normal development. Further, these 
studies indicate that the development 
of form and pattern perception, and 
preferential choice responses based on 
visual form and pattern, proceeds from 
simpler to more complex stimulation. 
The method employed by Fantz in 
his studies of human infants was to 
place pairs of stimuli (e.g., a large tri- 
angle with a small triangle, a circle 
with a cross, a striped pattern with a 
bull’s eye pattern, or two solid colored 
disks) above the infant's crib, record- 
ing the amount of time the infant fix- 
ated on each stimulus in a standard 
testing period. Testing continued for 
a period of 2 months. The results 
show that the younger infants spent 
more time looking at simpler stimuli 
than did older infants (eg., 2-3 
month old infants spent more time 
looking at solid colored disks than 
did infants over 3 months); while 
the older infants spent more time look- 
ing at complex stimuli than did the 
younger infants (eg., infants over 
3 months old prefer to look at a 
bull’s eye pattern and a “face” rather 
than solid disks, while the opposite 
was the case for infants under 3 
months). The fact that pattern per- 
ception is present in the human infant 
as early as 2 weeks of age was 
demonstrated by the finding that the 
baby showed a definite preference for 
a striped pattern over a bull’s eye at 
this age. The development of prefer- 
ential choice responses was also seen 
in the finding that the choice responses 
changed from the striped to the bull’s 
eye pattern by about 12 weeks of age. 
Fantz (1957) has also shown that 
the chick has an inborn preference for 
the rounder of two objects, as indicated 
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by differential pecking responses when 
the subject was presented with two 
solid-block stimuli differing in degree 
of angularity. Shapes tested included 
a round ball, a cylinder, a square, and 
an equilateral triangle. Results showed 
that chicks preferred to peck at the 
ball more than any other object, while 
the triangle evoked fewer pecking re- 
sponses than all other stimuli. These 
results suggest that degree of contour 
of stimuli (i.e. amount of curvature 
in this instance) might be a critical 
factor in the chicks pecking prefer- 
ences, and, by inference, in their basic 
perceptual responsiveness. 


Imprinting 


The Fantz experiments serve as the 
primary examples of developmental 
aspects of a FAP in humans. A sec- 
ond phenomenon concerning percept- 
ual development with critical periods 
for maximum utilization of visual ex- 
perience is imprinting. This concept 
refers to relatively permanent, short- 
term (or even one trial) learning, 
which occurs early in the life of the 
organism. 

Imprinting is usually measured by 
the strength with which the subject 
will follow a moving stimulus object 
during a standard time period. Studies 
by Jaynes (1957) and by Hess (1959) 
lead to the conclusion that imprinting 
does in fact occur only at critical 
periods. These periods ranged from 
approximately 16 to 24 hours after 
hatching in most species tested. Fur- 
ther, a study by Gottlieb (1961) estab- 
lished that the particular age for im- 
printing in any individual subject is 
largely determined by the maturation 
of the organism from conception, and 
not simply by the time since hatching. 
Thus, imprinting at a critical age ap- 
pears to be a function of the matura- 
tion of the neural and/or muscular 
connections involved in responding to 


the imprinting stimulus. A third con- 
clusion that might be made from im- 
printing studies (e.g., Moltz, 1960) is 
that the onset of imprinting is a func- 
tion, on the stimulus side, of temporal 
stimulus change in the form of move- 
ment, with the exact physical nature 
of the stimulus object essentially 
irrelevant. 

The elicitation of the following re- 
sponse, which serves as the behavioral 
indicator of imprinting, seems to be a 
FAP in the sense that the term is used 
in this paper. Thus, following a mov- 
ing stimulus at a certain critical age is 
a stereotyped, species-specific response 
occurring in many avian species. For 
this reason; any explanation of the 
FAP, in either behavioral or neuro- 
physiological terms, should also pro- 
vide an explanation for imprinting and 
other related developmental and criti- 
cal period phenomena ‘such as the 
preferential choice behavior of human 
infants and chicks. 


SOME NEUROPHYSIOLOGICAL PROBLEMS 
RAISED IN STUDIES INVOLVED WITH 
FIXED ACTION PATTERNS 


The experiments summarized above 
indicate that many organisms exhibit 
differential stimulus preferences at 
birth. Furthermore, preferences change 
as a function of maturation and experi- 
ence, per se. An important conclusion 
from such evidence is that new-born 
animals, including primates and hu- 
mans, do not have to start from scratch 
to learn to see and to organize complex 
visual stimulation. Rather, these or- 
ganisms perceive and respond to com- 
plex stimuli without experience and 
learning. This conclusion, and other 
implications of the studies cited above, 
raises several behavioral and neuro- 
physiological problems. 

First, each FAP appears to require 
some highly specific environmental 
stimulus for the evocation of the be- 
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havior in question. This implies that 
some mechanism must be built into the 
organism that functions to encode and 
categorize incoming information such 
that the FAP will be evoked when the 
appropriate information, in terms of 
specific stimulus attributes, excites the 
receptor organ. Secondly, the develop- 
mental nature of most, if not all, of 
the behavior described above implies 
that the neural mechanism mediating 
the FAP must go through a definite 
maturation period paralleling behav- 
ioral development. Thus, the question 
of a neural locus for perceptual devel- 
opment and organization, with particu- 
lar emphasis on the contributions of 
peripheral and central processes be- 
comes pertinent. The theory to be de- 
veloped at this point is proposed as a 
partial solution to these problems. 

This paper proposes that the or- 
ganization and integration of visually 
controlled FAPs depends on the 
peripheral encoding of afferent stimu- 
lation by specific sensory units (or the 
interaction of such units) lying at, or 
just below, the level of the receptor. 
Further, these stimulus-specific neural 
units undergo developmental and ma- 
turational changes, during which time 
units responsive to different specific 
aspects of environmental stimulation 
mature at different rates. During this 
maturation period, the organism be- 
comes sensitive to a wider variety and 
range of environmental stimulation as 
a function of the rate of differential de- 
velopment of the neural units. 

To recapitulate, the model presented 
here involves two basic assumptions. 
First, the major problem involved in 
explaining the stimulus (or releasing) 
function in unlearned behavior is a 
perceptual matter, with critical period 
phenomena being largely a function of 
the development or maturation of in- 
born perceptual capabilities. Secondly, 
innate perceptual capability depends on 


peripheral neural units that are differ- 
entially sensitive to specific attributes 
of environmental stimulation. 

Early work by Barlow (1953), Hart- 
line (1938), and Kuffer (1953) showed 
the great importance of peripheral vis- 
ual information processing systems in 
the retina. The studies to be presented 
in the next part of the paper are offered 
as evidence for the existence of neural 
units that function in the manner de- 
manded by this theory, and as evidence 
for the developmental nature of the 
visual modality. 


EVIDENCE FOR A PERIPHERAL INFOR- 
MATION PROCESSING SYSTEM AT 
THE LEVEL OF THE RETINA 


The work of Maturna, Lettvin, Mc- 
Culloch, and Pitts (1960) is relevant 
to the hypotheses advanced in this pa- 
per. Their experiments studied the 
function of the retinal ganglion cells 
of the frog, Rana pipiens, in response 
to “light and dark objects of various 
sizes and shapes moved in the visual 
field against various kinds of back- 
grounds” including a color photograph 
of the frog’s natural environment. 
This work followed from the discovery 
of three groups of retinal ganglion cells 
that appeared to fire specifically to 
either the onset, offset, or onset and 
offset of light-flash stimulation (e.g., 
Barlow, 1953; Hartline, 1938; Kuffler, 
1953). These pioneer studies estab- 
lished the existence of stimulus-specific 
neural units, but were confined solely 
to light-flash stimulation. The study to 
be reported in some detail here differs 
from earlier work in use of form, size, 
brightness, and movement as stimuli, 
rather than simple light flashes. 

The Maturna et al. studies recorded 
electrical potentials from axons in the 
optic nerve and from terminals in the 
tectal lobes in unanaesthesized animals 
with exposed tectum and/or optic 
nerve. The results of these studies in- 
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dicated that the ganglion cells of the 
frog fall into five classes serving to 
process information from specific at- 
tributes of environmental stimulation. 
These five classes were named as fol- 
lows: Sustained edge detection, convex 
edge detection, changing contrast de- 
tection, dimming detection, and dark- 
ness detection. The first four classes 
of cells apparently perform “data proc- 
essing operations on the visual image, 
while the fifth class measures light in- 
tensity.” Furthermore, the first four 
classes seemed to function indepen- 
dently of changes in general illumina- 
tion and of changes in the “general 
structure of the visual environment.” 
Also, the cells of Classes 14 appeared 
to respond especially to moving objects, 
while the fifth class did not respond to 
movement. The following paragraphs 
will describe the functions of each class 
in greater detail, with special emphasis 
on the specific stimulus attributes that 
appear to be necessary for the 
activation of the cell. 


Sustained Edge Detection 


Cells in this class have a small recep- 
tive response field (RRF). The RRF 
refers to all areas on the retina which 
influence the response of the cell. 
Fibers from these cells project to the 
tectum by unmyelinated axons whose 
terminals are almost solely in the first 
tectal layer below the pia mater. These 
cells do not respond to on or to 
off changes in environmental stimuli 
whether such changes are sudden or 
gradual. Electrical activity is only 
produced by the introduction of a sharp 
edge of an object into the retinal field 
for this cell. The edge can be either 
lighter or darker than the background. 
Further, these units respond with a 
greater frequency to smaller than to 
larger edges, with the highest frequency 
occurring when the object moves into 
the RRF and stops. 


Thus, the duration and frequency of 
electrical response in these cells ap- 
pears to be a function of the speed of 
movement and the position of a sharp 
edge in the RRF. Also, the response 
is almost independent of general illu- 
mination level (i.e. there are no 
changes in response due to changes in 
illumination above the threshold level 
of illumination for initiating the re- 
sponse). Finally, and most significant, 
an individual cell of this class gives 
either (a) a response only to move- 
ment in a single direction, or (b) a 
large response to movement in one di- 
rection and very little movement in the 
other direction. 


Convex Edge Detection 


The second class of cells also has a 
small RRF and projects to the second 
tectal layer by unmyelinated fibers. 
These units respond with a strong burst 
of activity to the movement of a small 
object darker than the background. 
Straight edges produce very minimal, 
or no, response in these cells, while the 
response frequency increases with the 
degree of curvature of the stimulus 
object. Further, the response occurs 
only when the object stimulates the cell 
during its movement and then stops in 
the RRF. If the object moves into the 
RRF during a dark period and stops 
before illumination is presented, then 
no electrical activity is recorded when 
the illumination level is raised. Also, 
the stimulus object apparently must 
move toward the center of the RRF. 
If the object moves away from the 
RRF, activity ceases. 

Thus, although frequency of response 
in these cells is a function of the net 
positive curvature of the stimulus ob- 
ject, a necessary condition for activa- 
tion is movement. And, the move- 
ment must be toward the center of the 
RRF for that cell. Also, as men- 
tioned above, these cells respond maxi- 
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mally to objects darker than the back- 
ground, and minimally or not at all to 
objects lighter than the background. 
Thus, the cell responds to a dark-light, 
but not to a light-dark contrast between 
stimulus object and background. Fi- 
nally, any background movement occur- 
ring while the stimulus object is within 
the RRF produces no change in re- 
sponse. The response is also indepen- 
dent of the general illumination level 
above threshold intensity. 


Changing Contrast Detection 


The cells involved in this response 
class lie in the fourth layer of the 
tectum, the third histological layer 
being “silent.” The axons of these 
units are thin myelinated fibers with 
larger RRFs than the cells of the first 
two types. These units respond to 
light onset and to light offset with 
small bursts of spikes at high fre- 
quency. Furthermore, this class is 
highly sensitive to movement in any 
single direction for any single cell. 
The number of spikes and the upper 
frequency of response is a function of 
the velocity of the moving edge of the 
stimulus object, with an optimal speed 
necessary for an optimal response. 
Thus, either movement that is too fast 
or too slow produces a less-than-opti- 
mal response. Also, these cells never 
give a response to a stationary object, 
but give long bursts to a slowly mov- 
ing object. Finally, the response is 
also independent of the general il- 
lumination level, but is not as strong 
at low illumination as the cells of 
Classes 1 and 2. 


Dimming Detection 


These cells project to the sixth tectal 
layer, the fifth being “silent.” The 
projections of these cells represent the 
largest RRFs of any cell tested. They 


give a long response to light offset. 
These units respond to any moving 
object regardless of size, shape, or con- 
trast, in proportion to the dimming 
produced by the object passing across 
the RRF. Thus movement and the 
absolute amount of illumination re- 
duction are the stimulus parameters 
determining responses for these cells. 
Although these cells have the largest 
RRFs, they represent the least numer- 
ous cell frequency among the first four 
classes. 


Dark Detection 


Terminals from the axons of cells 
in the fifth class are apparently mixed 
with the terminals of Class 3 in the 
fourth tectal layer. These cells are 
continuously active, even in bright 
light; but, their maximal activity oc- 
curs in darkness. Thus, responses in 
this fifth class are an inverse function 
of the light intensity (ie., the higher 
the illumination level, the lower the 
response frequency). These units 
show little or no change in response 
to sudden illumination changes or to 
movement. 


Implications and an Exact Hypothesis 


The evidence presented above indi- 
cates, that at least in the frog, a large 
part of the information encoding and 
processing from specific environmental 
stimulus attributes occurs at the 
peripheral receptor level of the retinal 
ganglion cells. Thus, some, if not 
most, of the perceptual information 
necessary for the performance of dis- 
criminatory responses to a complex 
visual environment appears to be re- 
lated to specific retinal units which 
exhibit maximal or total sensitivity to 
specific dimensions of stimulation. 
These facts, of course, are not meant to 
support any notions that the central 
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nervous system is not necessary for 
perception or information processing, 
but only that these peripheral functions 
perform complex operations irrespec- 
tive of those performed by the brain 
itself. 

The hypothesis to be developed from 
this evidence is concerned only with 
the receptive or perceptual aspects of 
innate behavior, and not with problems 
concerning the neurological bases of 
responses. Thus, to recapitulate, the 
notions to be presented below ignore 
central factors in perceptual and infor- 
mation processing functions involved 
in FAPs. They also ignore response 
factors involved in such phenomena in 
order to keep the hypotheses to be pre- 
sented in a manageable form. This 
simplification is based on the viewpoint 
that model-building, in an area of study 
so new and undeveloped as this, must 
start with simple mechanisms as a 
first approximation to a more com- 
plete neural model underlying innate 
behavior. 

Hypothesis I. The stimulus (re- 
leaser) function involved in the evoca- 
tion of fixed action patterns is organized 
and integrated by peripheral neural 
units. These units are responsive to 
certain specific aspects of environ- 
mental stimulation alone. In vision, 
these units are the ganglion cells of 
the retina, Any particular fixed action 
pattern will result from the evocation 
of activity in one, or a combination, of 
these stimulus-specific cell groups. 

Hypothesis II. These peripheral 
neural units develop differentially in 
time as the neonate organism matures. 
With respect to visual stimulation, the 
organism will be responsive only to 
those aspects of environmental stimu- 
lation that correspond to the specific 
ganglion cells that have become func- 
tional at any moment in the organism’s 
maturation. 


An EXPLANATION OF VISUALLY CON- 
TROLLED FIXED ACTION PATTERNS 
IN TERMS OF THE PROPOSED 
NEURAL MECHANISM 


Consummatory Behavior in the Toad 
and Frog 


The toad and the frog perform con- 
summatory responses only to a small 
moving stimulus object. The Maturna 
et al. study offers a direct explanation 
for this phenomenon. The frog’s eye 
contains neural units which respond 
solely to movement. In fact, certain 
of these units respond only to direction 
of movement of a small object darker 
than the background. Such a stimu- 
lus is the necessary, and apparently 
sufficient, condition for evocation of 
the striking-capture-ingestion FAP in- 
volved in the frog’s consummatory be- 
havior. The behavior of the frog thus 
implies that a mechanism maximally 
sensitive to movement, to specific phys- 
ical size, and to figure-ground contrast 
determines the consummatory response. 
The frog’s ganglion cells are demon- 
strably capable of this function. 


The Development of Visual Perceptual 
Choice Behavior 


In Fantz’s experiments, the human 
infant displayed differential discrimi- 
natory and preferential perceptual re- 
sponses at different stages of matura- 
tion. The hypotheses suggested in this 
paper propose that differential matura- 
tion rates of stimulus-specific ganglion 
cells are responsible, at least in great 
part, for this phenomenon. The human 
infants in the Fantz experiment showed 
an initial preference for solid disks, 
and for a striped rather than a checker- 
board or a bull’s eye pattern ; with pref- 
erence for solid colored stimuli appear- 
ing at the earliest stages in development. 

If ganglion cells responsive to cer- 
tain specific, single color, stimulation 
mature before cells responsive to 
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brightness changes, then perception 
should proceed from single colors to 
stimuli differing in brightness. Sec- 
ondly, if ganglion cells responsive to 
brightness changes mature at a faster 
rate than cells responsive to contour 
(either amount of periphery or amount 
of curvature, or both), then preferen- 
tial fixation responses should proceed 
from stimuli differing in brightness 
alone to stimuli differing in both bright- 
ness and in some aspect of contour. 
This is the case with the development 
of choice responses from striped to 
bull’s eye stimuli. 

The concept of contour is considered 
in this paper to include either (a) 
amount of border (periphery), (b) de- 
gree of curvature, or (c) both. The 
exact identification for contour will 
depend on the stimulus being con- 
sidered in any particular case. The 
rationale for breaking contour into two 
categories comes from the Maturna 
et al. study which found ganglion cells 
responsive solely to amount of border 
and other cells responsive solely to 
degree of curvature. 

In a discussion of these and similar 
phenomena, including the newly 
hatched chick’s preference for round- 
ness in pecking behavior, Berlyne 
(1960) suggests the following hy- 
pothesis : 


These preferred patterns contain much more 
contour than the ones with which they were 
paired, and this fact may well be the key. 
Many of the receptors in the retina... 
become active only when light begins or 
ceases to impinge on them. And the eyes 
are generally in motion. . . . Consequently 
patterns with more contour will bring... 
{certain classes of] receptors into play to a 
greater extent and thus have a higher stimu- 
lation value (p. 99). 


The neural requirements proposed by 
Berlyne’s hypothesis appear to fit the 
model presented here quite well, par- 
ticularly in explaining specific short- 


term preferences that change as the 
organism matures. 


Visual Supernormal Releasing Stimuli 


A supernormal releaser was defined 
above as a stimulus object having some 
attribute or attributes different from a 
comparable stimulus object as found in 
nature, and which is more effective 
than the natural stimulus in evoking an 
FAP in a particular species. Given a 
choice between the naturally occurring 
stimulus (e.g., an egg actually laid by 
a member of the species in question) 
and a supernormal stimulus (e.g., an 
egg twice the size of the natural egg), 
the animal will prefer (make approach 
responses toward) the supernormal 
stimulus object. Such a situation rep- 
resents the operational definition of a 
supernormal releasing stimulus. 

In the examples of supernormal re- 
leasers cited above, amount of contrast 
or degree of contour appear to be the 
major stimulus factors determining 
preferential egg setting responses in 
the Ringed Plover and Oyster Catcher. 
The model offered in this paper ex- 
plains these phenomena in the follow- 
ing manner. Retinal ganglion cells are 
differentially sensitive to visual con- 
trast and to stimulus contour (amount 
of border or degree of roundness). 
The greater the degree of contrast or 
contour present in a stimulus, the 
greater is the degree of electrical activ- 
ity in the cell populations responsive 
specifically to contrast and contour, be- 
cause more contour- or contrast-specific 
ganglion cells are assumed to be ac- 
tivated. Thus, a stimulus with more 
contrast or contour will have a higher 
“stimulation value” than will a stimu- 
lus with less contrast or contour because 
(a) more ganglion cells will be acti- 
vated, and/or (b) the frequency of 
discharge in a fixed population of stim- 
ulus-specific cells will increase. 


Neurat Mecnanisu 49 


This hypothesis can be put into more 
psychological terms by assuming that 
the greater the contrast or contour 
present in a stimulus, the greater is 
the complexity (i.e., spatial hetero- 
geneity) or novelty (i.e. temporal 
stimulus change) in the situation. 
Within certain boundary limits, the 
greater the degree of complexity or 
novelty the greater are the assumed 
“invitational properties” of the stimu- 
lus and, correspondingly, the higher 
will be the probability of a preferential 
choice response for the stimulus with 
greater complexity, or change over 
time, or both (Dember & Earl, 1957). 

In this context, a supernormal re- 
leaser is viewed as a stimulus object 
presenting a greater degree of spatial 
stimulus change (complexity) or tem- 
poral stimulus change (either novel or 
nonnoyel) to the organism than does a 
comparable natural stimulus. Thus, 
the motivational effects of visual stimu- 
lus change are assumed to be mediated, 
at least in part, by increased activity in 
stimulus-specific ganglion cell popula- 
tions. Therefore, a supernormal re- 
leaser is effective in evoking prefer- 
ential choice responses over the natural 
stimulus because the greater amount 
of stimulus change (complexity or 
novelty) in the supernormal releaser 
results in electrical activity in a greater 
number of ganglion cells, or in in- 
creased activity in a fixed population 
of cells specific to the stimulus proper- 
ties of that releaser. Thus, the degree 
of activity in stimulus-specific ganglion 
cells is an increasing function of in- 
crements in the amount of complexity 
or temporal change on specific attri- 
butes of the stimulus (e.g., contrast, 
contour, etc.). 


Imprinting Phenomena and Critical 
Age Behavior 


The theory offered here explains the 
onset of the imprinting critical period 


as follows: The developmental hypoth- 
esis states that the stimulus-specific 
functions of the ganglion cells mature 
at a differential rate. In the imprint- 
ing situation it is hypothesized that 
cells responsive to movement do not 
mature as rapidly as cells responsive to 
other specific aspects of stimulation 
such as brightness, contrast, and con- 
tour. When the visual cells responsive 
to movement become functional (at 
the critical period for imprinting) 
movement is perceived for the first 
time. This first perception of move- 
ment stands out in great temporal dis- 
tinctiveness to visual stimulation per- 
ceived prior to maturation of these 
cells. If one makes the plausible as- 
sumption that attentional (invitational) 
properties of stimulation are positively 
effected by novelty (i.e., temporal stim- 
ulus change), then the first perception 
of movement at this critical age would 
serve as a particularly potent invita- 
tional stimulus for the following re- 
sponse measured in imprinting studies. 
A test of this hypothesis by electro- 
retinogram or single ganglion cell re- 
cording should show a differential rate 
of development in the ganglion cell 
population of the chick or duckling. 
Large populations of stimulus-specific 
cells should become active at the criti- 
cal age for imprinting, as evidenced 
behaviorally, in a given species. 


Evidence for the Maturation of the 
Retina in the Chick 


Peters, Vonderhae, and Powers 
(1958) present evidence for matura- 
tion of central and peripheral neural 
activity in pre- and posthatch chicks. 
In their study, the eye and the optic 
lobes of newly hatched and of embryo 
chicks were examined for changes in 
electrical activity in response to photic 
stimulation with white light presented 
at frequencies from a single flash to 
60 flashes per second, 
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Prior to the seventeenth days of 
incubation (3-4 days before hatch- 
ing), neither the eye nor the op- 
tic lobes gave electrical changes in 
response to photic stimulation. On the 
eighteenth day of incubation, both the 
eye and the contralateral lobe showed 
definite but inconsistent electrical 
changes. In the newly hatched chick, 
the stimulated eye and the contra- 
lateral optic lobe each gave on and off 
responses to onset and cessation of 
sustained stimulation at 60 flashes per 
second. Both structures also exhibited 
distinctive electrical changes to single 
flashes and to frequencies lower than 
60 flashes per second. 

Spontaneous rhythmical electrical 
activity was recorded from the optic 
lobes several days before the eye 
showed electrical responses to photic 
stimulation. The optic lobes must, 
therefore, mature faster than do the 
peripheral retinal units. Thus, “the 
maturation of the retina is the limiting 
factor in the general response to light 
as well as to the ability to react to indi- 
vidual flashes at certain frequencies” 
(Peters et al., 1958, p. 466). Thus the 
eye depends on maturation to reach its 
final level of functioning. Further- 
more, peripheral retinal factors may be 
more important in determining the 
maturation of visual perception than 
more central factors. 
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An evaluation of theoretical treatments of the level of aspiration con- 
cept including utility maximization, Simon's satisficing model, Siegel's 
measurement technique, and the theories of changes in aspiration 


proposed by Festinger and the author. 


It is suggested that levels of 


aspiration are highly operational goals in a hierarchical cognitive struc- 
ture, this structure being subject to quasi-logical constraints of the type 
recently emphasized by attitude theorists. 


In recent years the level of aspira- 
tion concept has enjoyed a revival as 
a basic element in cognitive theory. 
One reason for interest in the concept 
is that it forms a meeting ground be- 
tween modern utility theorists and 
Lewinian field theorists. It is the 
purpose of this paper to review the 
theoretical background of the concept 
and to suggest some generalizations to 
a broader theory. 


Level of Aspiration 


A level of aspiration is a subjective 
goal for performance. It serves as the 
reference point for feelings of success 
or failure. To the problem solver, 
performance which exceeds the level 
of aspiration is success, and perform- 
ance which falls short of the level of 
aspiration is failure. 

The problem solver has goals (de- 
noted by the subscript 7). He per- 
ceives behavior alternatives (denoted by 
the subscript j) and mutually ex- 
clusive consequences of these behavior 
alternatives (denoted by the sub- 
script k). The problem solver’s task 
is to choose a behavior alternative, the 
consequences of which will satisfy his 
goals. 

Relative to a specific goal, the 
utility of a particular consequence is 

The author is indebted to James G. March 


and Jacob Marschak for their comments and 
criticisms, 
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Cy. There is also utility, Ay, at- 
tached to a particular behavior al- 
ternative? The usual treatment of 
the problem solving process would 
make the net utility of achieving a 
particular consequence by means of a 
particular behavior alternative 


Uijk =Ca+A ijs 
and 


Ujk dX Vie. 


Alli 


The problem solver's goals structure 
his preferences, but do not enter 
directly into the utility function. 

The level of aspiration concept sug- 
gests that an analysis of the problem 
solver’s frame of reference must take 
into account the affective and conative 
reactions to success and failure. In 
particular, two additional utilities 
must be considered: the utility of 
success, Siz, and the utility of failure, 
Fi. For each goal there is at least 
one level of aspiration, L:, which 
serves as an operational reference 
point. The net utility of achieving a 
particular behavior alternative is a 
function of these levels of aspiration: 


U* p= Ui t+ Sin if Ui t+ SiZ Li 
U* n= Uit Fin if Uist Sin <Li, 
2 A;; probably tends to be negative (e.g., 


unpleasant effort), but it can be positive (e.g., 
Allport’s “functional autonomy”’), 
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and 
U* = pe U* ijk. 
Alli 

The utilities Si and Fij are prob- 
ably functions of Ai, Cx, Li, the 
subjective probabilities of success or 
failure, and the degree of privacy sur- 
rounding the problem solver.’ 


Maximizing and Satisficing Hypotheses 


An adequate theory of problem 
solving based on the level of aspiration 
concept must explain (a) how be- 
havior alternatives are selected given 
a level of aspiration and (b) how the 
level of aspiration is affected by 
experience and perception. 

One hypothesis about the selection 
of behavior alternatives follows nat- 
urally from modern utility theory. 
That is, behavior alternatives are 
chosen, given a level of aspiration, in a 
way that maximizes expected utility. 

In order to maximize in a multiple- 
goal framework, consequences must be 
comparable across all goals. In 
general, there must be a single scalar 
measure of utility. Simon (1955) has 
treated the multiple-goal case in a way 
that does not require comparability 
across goal dimensions; goals can be 
stated in dimensions which are di- 
rectly observable in the world. The 
utility U*;;, takes a discrete step at Lj. 
Simon observed that in many cases 
this step is large relative to increments 
in the utility function at other points 

. that the problem solver es- 
sentially factors all consequences into 
one of two classes: satisfactory and 
unsatisfactory. Simon hypothesized 
that a behavior alternative is chosen 
such that U;;, > L: for all ¿ and all 
subjectively possible k. 


3 U*;;, could also be defined: 
U* ie = Uiir + Sij if Ui > Li, 
and 


U* sin = Usm + Fisk if Opin < Li. 


TABLE 1 
OUTCOME UTILITIES 


Consequence 
(number of scores) 
k Ce 


0 10 
1 -20 
2 -50 
3 -60 


An example may make clear the 
difference between the maximizing hy- 
pothesis and Simon's ‘‘satisficing”’ 
hypothesis. Assume that a subject is 
asked to throw three darts at a target. 
The target is binary; a dart either 
scores or does not score. There are 
four possible consequences according 
to the number of darts which score: 
0, 1, 2, or 3. If the subject has only 
one goal, to hit the target with the 
darts, his outcome utilities might be 
those in Table 1. These utilities are 
independent of the particular behavior 
strategy chosen; they could be mone- 
tary payoffs for example. 

The subject must choose a behavior 
alternative, say the amount of effort 
he exerts on the task (Column 1, 
Table 2). Each behavior alternative 
implies a utility (or disutility) which 
is independent of the consequence 
(Column 2, Table 2), and also implies 
a conditional probability for each con- 
sequence, P; (Column 3, Table 3). 
The behavior alternatives and con- 
sequences together imply utilities of 
success and failure (Columns 4 and 5, 
Table 2). In an experimental situa- 
tion like this one, the utilities Sj, and 
Fj, could easily dominate the outcome 
utilities, Cr. 

If the subject’s level of aspiration 
is .25, the net utilities U*; would be 
those in Column 6, Table 2. The 
expected utilities from Columns 3 and 
6 of Table 2, are given in Table 3. If 
the subject chooses the behavior 
alternative which maximizes his ex- 
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TABLE 2 
Net UTILITIES 
a) Q) (3) (4) (S) (6) 
| 
Subjective | 
probability Sir Fa U*a 
of k given j 
Pri for L = .25 
Behavior A; 
alternative] “^? — 
k k k k 
0 1 2 Ss 70) 2 2 3 
ns E En errr 
I 0 | .97 | .03 0 |.50 
II — .10 | .86 | .14 02 | .53 
III — .20 | .73 | .24 | .03 04) 56) .81 | 
IV — 30 | 51 | .39 | .09 | .01 |.06 | .59 | .84 |1.02 
V — .40 | .30 | .44 | .22 | .04 | .08 | .62 | .87 |1.06 
VI — .50 | .12 | .38 | .38 | .12 |.10|.65| .90 [1.10 
VII — .60 | .04 | .22 | .44 | .30 |.12|.68| .93}1.14 
VIII — .70 | .01 | .09 | .39 | .51 |.14[|.71]| .96 [1.18 
IX — .80 03 | .24 | .73 99 11.22 
X — .90 .14 | .86 1.02 |1.26 
XI —1.00 .03 | .97 1.05 |1.30 
XII —1.10 .01 | .99 1.08 |1.34 
XIII —1.20 1,00 1.38 


pected utility, he would choose Al- 
ternative X. On the other hand, if 
the subject behaves according to 
Simon’s hypothesis he could choose 
any of the alternatives X, XI, XII, 
and XIII. Each alternative would be 
evaluated on the basis of the worst 
possible outcome, the alternative’s 
“security level” as the term is used in 
game theory.4 Alternatives X, XI, 
XII, and XIII all have security levels 
above the level of aspiration (as given 
in Table 3). 

There are distinct differences be- 
tween the two models.’ The maximiz- 


‘The analogy between Simon’s satisficing 
model and the maximin criterion of game 
theory is obvious. However, Simon relies 
upon dynamic adjustments to produce a 
nearly optimal solution rather than specifying 
that a maximin choice is always made. More- 
over, Simon views satisficing as just one of 
several alternative decision strategies which a 
problem solver might use on different occa- 
sions, depending upon which decision strategy 
seems to be most appropriate. 

5Some utility theorists would argue that 
the characterization of the maximization hy- 
pothesis given here is unfair. Their position 
would be, to quote one, “No one seriously 
interested in utility theory believes . . . that 
People actually attempt to maximize ex- 


ing hypothesis suggests that the 
problem solver makes comparatively 
careful analyses of his own environ- 
ment. At least for alternatives close 
to the optimum (say alternatives VII 
to XIII inclusive in the example), the 
subject must assess fairly accurately 
the utilities Cr, Aj; and Sj and the 
probabilities Pj. Simon’s “‘satis- 
ficing” hypothesis suggests that the 
problem solver makes comparatively 
crude analyses of his goals and 
his environment. Consequences are 
classified as possible or impossible ; the 
probabilities P; are classified as zero 


pected utility, That is, they compute the 
relevant probabilities and utilities, and their 
products. . . . The hypothesis is always 
stated in the form: Individuals behave as 
though they were maximizing expected util- 
ity.” The author's position on this point is 
as follows. First, none of the comments in 
this section are intended to characterize one 
hypothesis as better and the other as worse. 
The comparisons simply point to relative 
differences. Second, since the decision process 
in question is the subject of inference rather 
than observation, the satisficing hypothesis 
is also a “behaves as though” hypothesis. 
Provided that “behaves as though” is inserted 
uniformly throughout this section, the com- 
parisons stand. 
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TABLE 3 
Exrecrep Urmes 


Behavior 


Expected Security level 
alternative utility (minimum of 
j E (U*) U*a) 
i —.270 —.30 
I —.290 — 44 
Il —.256 —.58 
IV — .069 —.72 
Vi 191 —.86 
VI 499 —1,00 
vil .723 -1.14 
Vill 748 —1.28 
IX 878 —1.09 
x 12s 62s 
XI .890 „55s 
XII 836 48s 
XII .780 -78s 


Note.—s identifies satisfactory solutions for L = .25 


or nonzero. The utilities Cy, Aj, and 
Sy are assessed only carefully enough 
to determine whether C,+4Aj;+S,2>L, 
or not. The subject’s goal, under 
Simon's hypothesis, could have been 
stated: “Choose an alternative such 
that the consequences 0 and 1 are 
impossible.” 

The maximizing model produces a 
unique behavioral solution. In the 
example, only Alternative X isa satis- 
factory solution. Alternatives IX and 
XI, which are close to X in expected 
utility, are not satisfactory solutions. 
Consequently, the problem solver 
must exercise fairly accurate control 
of his own behavior. In contrast, the 
satisficing model does not produce a 
unique behavioral solution. In fact 
the behavior choice in the example 
could have been stated (giving a 
substantive interpretation to A;): 
“Exert at least .90 units of effort.” 

This is not to suggest that the 
maximizing and satisficing hypotheses 
are antithetical The models are 
fundamentally complementary. The 
maximizing model was developed 
under the assumption of essentially 
perfect certainty. The decision maker 
can perceive nearly all consequences 


and behavior alternatives, and the 
casual links between behavior and 
consequence are fairly clear. Simon's 
model was developed primarily to ex- 
plain behavior in situations where the 
problem solver perceives behavior 
alternatives sequentially, and where 
the casual links between behavior and 
consequence tend to be unclear. 
Simon emphasizes the effects of 
changes in the level of aspiration over 
time on the behavior choice. If the 
level of aspiration tends to rise after 
success or when several satisfactory 
solutions are perceived, the behavior 
choice will tend to be unique and 
nearly optimal. In the example, 
simultaneous perception of Alterna- 
tives X, XI, XII, and XIII could lead 
to a rise in the level of aspiration to 
some value between .62 and .78. 
Hence only Alternative XIII would be 
asatisfactory solution. (See Table 4.) 
Alternatively, simultaneous percep- 
tion of the four satisfactory alterna- 
tives could change the decision cri- 
terion to maximum expected utility, 
with or without a change in the level, 
of aspiration. If Z remains at .25, the 
solution would be Alternative X as in 
Table 3; if L rises to .70 the solution 
would be Alternative XI asin Table 4. 


Measurement of the Level of Aspiration 


The level of aspiration is most likely 
to be a significant concept in situa- 
tions where the problem solvers’ 
subjective definitions of success play 
important roles in their decisions. In 
many circumstances (particularly cir- 
cumstances of the type often created 
for an experiment), subjectively de- 
fined success is important simply 
because the net utilities U,;, are 
relatively unimportant . . . little ef- 
fort required and small monetary or 
social rewards. Moreover, as Simon 
suggests, subjectively defined success 
could assume importance because 
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TABLE 4 
Revisep Net axb Expecren Unumwes 


U*a for L = 70 


Behavior 


alternative | k 
J a 
0 | 1 | 2 
I —.30 70 
il —.44 18 
Ill —.58 31 Lil 
IV —.72 44 1,04 
V —.86 57 97 
VI —1.00 .70 90 
VII —1,14 83 -83 
VII —1,28 —.96 -16 
IX —1.09 —.54 
X —.67 
XI —.80 
XII —.93 
XIII | 


Security bevel 

(minimum of 
U*a) 
y | — 30 
i -44 
: ~.58 
1.32 —.432 l —.72 
1.26 | —.245 —.86 
1.20 | 100 —1,00 
114 | 479 |- =s 
1.08 | 748 | —1,28 
1.02 582 —1.09 
96 .732 —.67 
90 849s —.80 
84 822 —.93 
.78 780 78s 


Note.—s identifies satisfactory solutions for L = .70. 


making it important yields an effective 
decision criterion for uncertain and 
unknown environments. 

; Operationally, to say that sub- 
jectively defined success is important 
is to say that the differences between 
Fix. and Sy are large relative to 
variations in Uj. The increment 
from Uir + Fije to Usje + Six at the 
point L; must dominate, in some sense, 
the increments in U;; itself. 

Siegel (1957) has proposed an 
interesting method for measuring the 
level of aspiration which is based on 
this dominance. The subject is given 
the opportunity to specify his choices 
among several hypothetical alterna- 
tives. (“Would you rather have a or 
a 50% chance of 8 and a 50% chance 
of y?”) The subject is told that the 
consequences attached to these hy- 
pothetical alternatives are the con- 
sequences possible in the referent 
situation. The subject’s choices are 
used to establish an ordered metric 
preference scale over the conse- 


quences. Then on the premise that 
the step from Uji + Fije to Vin + Sin 
at the point L; is the largest interval 
in the set, Siegel infers that 


L= U*, 


where U*,; — U*ii 2 U% — U%ea 
for all k and where ki > ke if 
U*,, > U*, In fact, the strongest 
inference that should be made is 


Uti <L SU" 


Aside from the obvious assumption 
about the dominance of subjectively 
defined success, several other assump- 
tions are implicit in Siegel’s method. 
First, the measurement procedure 
substitutes an experimental situation 
for a referent situation. This sub- 
stitution must not distort the subject’s 
behavior. Net utilities in the experi- 


6 An ordered metric scale is a rank ordering 
for both the consequences and the utility 
increments between consequences. If con- 
sequences are continuous rather than discrete, 
an interval scale must be used. See Siegel 
(1956) for a discussion of ordered metric 
scales. 
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mental situation must have the same 
basic relationships to each other as 
do net utilities in the referent situa- 
tion; success, failure, and effort rela- 
tive to the experiment itself must be 
insignificant. Second, the referent 
situation must be fairly simple. The 
experiment assumes that the subject 
has only one goal and that he is 
accustomed to perceiving all alterna- 
tives simultaneously. Third, the net 
utilities U*; in the referent situation 
must be a single valued function of the 
consequences $. There must be a one- 
to-one correspondence between be- 
havior alternatives and consequences, 
and the subject must have had enough 
experience in the referent situation to 
make this association. Fourth, the 
experiment must cover all conse- 
quences objectively possible in the 
referent situation. 

Siegel's approach requires that some 
strong assumptions be made. How- 
ever, strong assumptions are necessary 
for inferring the decision criteria from 
the decisions made. Siegel's method 
is novel and imaginative in using this 
reverse deduction. Usually experi- 
menters simply ask their subjects: 
“How well do you hope to do?” It is 
not obvious what substantive content 
the subjects’ verbal responses have.’ 

What inference would Siegel’s 
method make in the example above? 
Obviously the inference depends upon 
the subject's actual level of aspiration 
and the behavior alternative which 
the subject uses asa referent. Assume 
that the subject’s level of aspiration 
is .25 as in Tables 2 and 3. If the 
subject uses Behavior Alternative X 
as a referent, Siegel’s method would 
infer that —1.22 < L < .62. If the 


7Siegel’s technique has been used ad- 
vantageously several times. The study pre- 
sented in Siegel (1957) is treated more 
elaborately in Becker and Siegel (1958). See 
also Becker (1958) and Siegel and Fouraker 
(1960) (especially pages 61-102). 


subject uses Behavior Alternative XI 11 
as a referent, Siegel's method would 
infer that —1.61 < L < 41. If a 
series of experiments could be run, 
where the referent behavior alterna- 
tive was controlled and actually 
varied from I to XIII inclusive, 
Siegel’s method could lead to the con- 
clusion that —.30 < L <.28. A 
similar series of experiments for a 
subject whose level of aspiration is 
.70 as in Table 4 could lead to the 
conclusion that —.30 < L < .70. It 
is likely, however, that an attempt to 
actually vary the subject’s referent 
behavior alternative experimentally 
would produce changes in his level of 
aspiration as well. Goals are un- 
doubtedly dependent upon the be- 
havior alternatives available. 


Changes in the Level of Aspiration 


It is important that levels of aspira- 
tion be changed adaptively . . . par- 
ticularly if the problem solver is to 
approach optimality under the satis- 
ficing hypothesis. 

As reference to Psychological Ab- 
stracts will confirm, a great many 
studies have been directed toward the 
effects of environment and perception 
on the level of aspiration.’ The evi- 
dence will not be summarized in 
detail here, although da Cunha Pereira 
(1956), Guerin (1958), Lewin, Dembo, 
Festinger, and Sears, (1944), and 
Rotter (1942) have published reviews. 
As one might expect, the findings 
indicate that changes in the level of 
aspiration are (a) adaptively func- 
tional—they tend to keep goals con- 
sistent with reality—(b) highly de- 
pendent upon recent and/or similar 
experiences, and (c) generally depend- 
ent upon cultural background, long 
run achievement history, and social 
forces. 

*In addition, March and Simon (1958) — 
have proposed some interesting hypotheses. 
See especially pages 47-52 and page 120. ` 
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Festinger (1942) proposed a theo- 
retical explanation for changes in the 
level of aspiration. This hypothesis 
is elaborated in the well-known paper 
by Lewin, Dembo, Festinger, and 
Sears (1944). To quote Festinger 
(1942): 


We have distinguished four factors which 
influence the choice of a goal: the positive 
valence of success ( Va,), the negative valence 
of failure ( Vay), the potency of success (Po), 
and the potency of failure (Pos). The choice 
of goal region (L), that is to say, the level 
of aspiration, will be determined by the 
resultant force toward L, the strength of 
which depends upon these four factors. This 
resultant force (f*) for a given level of 
difficulty may be determined by the equation: 


(1) f'n = Pox (Vast) — Poy, 1( Vaz, x) 

That region (L) toward which f* is greatest 

will be chosen as the goal region. 

(2) Level of aspiration = L at which 
J'» u = maximum 


The curve of the resultant forces which will 
determine the choice of goal region may be 
derived by formula (1). The region at which 
this curve reaches its maximum will be the 
level of aspiration. (pp. 239-240). 


In terms of current terminology, it 
seems reasonable to interpret ‘“‘re- 
sultant force” as “net expected val- 
ence” or “net expected utility." This 


°? Festinger’s notation and terminology are 
rather confusing. He uses Po, and Poy to 
indicate both “potency” and “expectancy.” 
The author's interpretation is: 
Va = valence (utility) 
Po( Va) = Po- Va = potency (expected 
utility) 
Po = expectancy (subjective 
probability) 
f* = resultant force (net expected 
utility). 
This interpretation seems to be consistent 
with Festinger’s example and his statement: 
“This resultant force will be positive if 
the failure factors ( Vay- Poy) are less than the 
Success factors (Vas-Po.), since f*,1 >0 
(pp. 241-242),” The later article by Lewin, 
Dembo, Festinger, and Sears (1944) is con- 
siderably less ambiguous. Sate 
These two articles use “level of activity 
and “level of difficulty” interchangeably and 


is the interpretation used by Siegel 
(1957). Festinger’s proposition is 
then: The level of aspiration is set at 
that “level of difficulty” at which the 
net expected valence (or utility) is 
maximized, 

If the level of aspiration is the 
reference point for subjective success 
or failure, Festinger’s proposition is 
contentless. Making the weak as- 
sumption that Sin > Fin, expected 
utility (valence) is monotonic with 
respect to the level of aspiration, Ex- 
pected utility (valence) will be maxi- 
mized, only if the level of aspiration 
is set so low that all possible con- 
sequences are successful consequences. 
The maximization hypothesis has 
meaning only if it is stated that 
behavior alternatives are chosen to 
maximize expected utility given a level 
of aspiration. 

At least under the satisficing hy- 
pothesis, a level of aspiration can be 
defined without reference to utilities. 
Any statement of the form V > P (or 
V < P) could be interpreted as a 
statement about a level of aspiration. 
V could be any function of perceivable 
variables, and P would be the referent 
parameter. If V> P, raising the 
level of aspiration would mean an 
increase in P and lowering the level 
of aspiration would mean a decrease 
in P. If V < P, raising the level of 
aspiration would mean a decrease in 
P and lowering the level of aspiration 
would mean an increase in P. To be 
consistent with the treatment above, 
P would be the level of aspiration. 
However, the isolated parameter 
would have no substantive content; 


to refer to two different things. These terms 
are used first to identify behavior alternatives 
(“level of difficulty” apparently refers to our 
subscript j) and second to identify conse- 
quences (“level of difficulty” apparently refers 
to our subscript k). This is not unreasonable 
in the context set forth above where the 
utilities are functions of both j and k. 
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P must be considered in association 
with the function V which defines it. 
Therefore, when speaking of the level 
of aspiration, the complete statement, 
V > P, will be assumed. 

One reasonable hypothesis would 
be that changes in levels of aspiration 
are controlled by stability conditions 
associated with the goals (Starbuck, 
1958). Stability conditions are in- 
equalities defining the environmental 
cues which trigger a change in asso- 
citated levels of aspiration. Asso- 
ciated with a particular level of 
aspiration, Vo > Po, might be sta- 
bility conditions Xo > Co, X12 Cr, 
and Xa < Cz . . . where Xo, Xi, and 
X are again functions of perceivable 
variables and Co, Ci, and C are 
parameters. When perception and 
experience satisfy the stability condi- 
tions, the level of aspiration is not 
changed. When perception and ex- 
perience violate one or more of the 
stability conditions, the level of 
aspiration is changed ; the direction of 
change depends upon which stability 
conditions are violated. 

The stability conditions are es- 
sentially a learning mechanism at a 
highly operational level. They are 
undoubtedly themselves subject to 
higher order learning rules at a some- 
what less operational level. In this 
sense, stability conditions are the 
lowest echelon, relevant to goals, in a 
learning hierarchy.” 

One type of goal change has been 
the center of a good deal of attention 
in the literature. Since levels of 
aspiration are basically operational, 
they must conform to the logical rules 
used by the problem solver. Under 
normal circumstances, simultaneously 
considered goals must be mutually 
consistent, and each goal individually 


10 March and Simon (1958) have discussed 
the operationality concept more thoroughly. 
See particularly Sections 3.4.2, 6.3, and 7.5. 


must be consistent with the perceived 
environment. That is, the problem 
solver's map of his life space. He 
must be able to satisfy simultaneously- 
considered goals simultaneously. If 
his goals violate these logical con- 
straints, the problem solver must 
change either his goals or his map of 
the life space. Stability conditions 
will determine whether goals or per- 
ceptions will change under conditions 
of conflict. 

It was pointed out above that levels 
of aspiration are simply statements of 
preference, the functional form being 
relatively unconstrained. As a result, 
levels of aspiration are functionally 
similar to, if not indistinguishable 
from, attitudes and sentiments. Atti- 
tude studies should bear directly upon 
level of aspiration theory. In fact, 
attitude theorists have given a lot of 
attention to changes in cognitive 
structure due to quasi-logical con- 
straints. The condition which has 
been termed “consistency” above is 
basically the same as ‘‘congruence,”’ 
“balance,” or “consonance” as others 
have used the terms. Work of par- 
ticular relevance has been done by 
Cartwright and Harary (1956), Fest- 
inger (1957), Heider (1946), Newcomb 
(1953), Osgood and Tannenbaum 
(1955), and Secord and Backman 
(1961). 


Goal Structure 


A level of aspiration is fundamen- 
tally an operational goal. It is a basis 
for action, stated with perceivable 
variables and in a simple way. On 
the other hand, a problem solver can 
have general, vaguely-defined, non- 
operational preferences which form a 
superstructure for his operational 
goals. As this ambiguous super- 
structure becomes more clearly de- 
fined, and as the problem solver learns 
more and more about his life space, 
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his levels of aspiration will change. 
New levels of aspiration will be 
established; old levels of aspiration 
will be revised or erased." 

This is not to imply that levels of 
aspiration apply only to immediate 
events. Some levels of aspiration 
include an implicit or explicit timing 
condition which defines a future date. 
The problem solver might have a 
salary goal of $5,000 for today and 
a salary goal of $20,000 for 20 years 
hence. The long-run goals define vec- 
tors for change in the short-run goals 
over time. Moreover, a series of re- 
lated goals which vary in timing de- 
fines a pattern to be used for planning. 

The preference hierarchy implies 
that the problem solver has a fairly 
extensive library of specific levels of 
aspiration which vary in generality, 
timing, and the sectors of the life 
space to which they apply... 
Vo > Pras VESP V: < Po, etc. 
Obviously decisions must be made on 
the basis of a relatively small number 
of goals or the computational and 
logical demands on the problem solver 
would be tremendous. It is there- 
fore desirable to distinguish between 
evoked and latent levels of aspiration. 
Evoked levels of aspiration are those 
in the problem solver’s phenomenal 
field; he is conscious of them at the 
time he makes the decision. Latent 
levels of aspiration are not part of the 
problem solver’s phenomenal field; 
he is not aware of them at the time he 
makes the decision. 

Goals are evoked by cues from the 
environment and by the problem 
solving process itself, since the prob- 
lem solver must simulate his environ- 
ment when tracing the consequences 
of a particular behavior alternative. 
Presumably this evocation process 
tends to act on clusters of goals 


“Cf. the Gestalt concepts of “silent 
organization” and “typifying schema.” 


defined by the hierarchial goal struc- 
ture. Consequently, elements in the 
set of evoked goals tend to be asso- 
ciated with one another. If the 
evocation links between goals are 
subject to reinforcement when goals 
are evoked simultaneously the goal 
hierarchy will tend to assume a 
structure which is efficient relative to 
the perceived environment.” 


Implications for Research 


Research on level of aspiration 
theory has tended to focus on changes 
in aspiration resulting from experience 
and perception. There has been 
relatively little work done on the 
effects of the level of aspiration on 
behavior. Consequently, a number of 
interesting problems remain, two of 
which are especially interesting to the 
author. 

One basically empirical problem 
relates to the decision criterion itself. 
The viewpoint adopted above is that 
people maximize under some circum- 
stances and satisfice under other 
circumstances. However, there is 
little systematic evidence to indicate 
when one type of problem solving is 
evoked rather than the other, the 
theory being based on differences be- 
tween the assumptions which the two 
hypotheses make. 

One basically theoretical problem 
relates to the efficiency of subjectively 
defined success. On the basis of 
evolution and learning, one would 
expect to find that humans establish 
goals for themselves because it is an 
effective way to deal with their 
environment. That is, humans are 
more likely to survive if they use a 
decision method which incorporates 
subjective success and failure, than if 
they use a purely “objective” decision 


12 March and Simon (1958) have discussed 
the structure of goal hierarchies in organiza- 
tional problem solving (pp. 190-194). 
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method. Whether this proposition is 
true, and under what circumstances, 
remains to be shown. 

Three other problems suggested 
above seem to be worth exploring. 
First, can the referent behavior alter- 
native in Siegel's method be varied 
without changing the subject's level 
of aspiration? Second, what is the 
nature of the relationship between 
attitude theory and level of aspiration 
theory? Third, what is the role of a 
cognitive preference hierarchy in the 
control of goal evocation? 
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The two-state “high” threshold model is generalized by assuming that 
(with low probability) the threshold may be exceeded when there is 


no stimulus. 


Existing Yes-No data (that rejected the high threshold 


theory) are compatible with the resulting isosensitivity (ROC) curves, 
namely, 2 line segments that intersect at the true threshold prob- 
abilities. The corresponding 2-alternative forced-choice curve is a 45° 
line through this intersection. A simple learning process is suggested 
to predict S’s location along these curves, asymptotic means are derived, 
and comparisons are made with data. These asymptotic biases are 
coupled with the von Békésy-Stevens neural quantum model to show 
how the theoretical linear psychometric functions are distorted into 
nonsymmetric, nonlinear response curves. 


A classic postulate of psychophysics 
is that some stimuli or differences 
between stimuli never manage to 
affect the central decision making 
centers; others, of course, do. Ina 
phrase, peripheral thresholds were 
assumed to exist. At least three types 
have been distinguished: absolute, 
difference, and detection. It is not, 
however, clear that there is any real 
difference among them. Absolute 
thresholds seem to be the same as 
detection ones except that the only 
noise is internal, and many difference 
threshold experiments differ from de- 
tection experiments only in the nature 
of the background stimulus, e.g., a 
pure tone or noise. 

Recently the literal interpretation 
of the threshold postulate has been 


1This research was supported in part by 
Grants NSF G-17637 and NSF G-8864 from 
the National Science Foundation to the 
University of Pennsylvania. 

I wish to express my appreciation to R. R. 
Bush, Eugene Galanter, Francis W. Irwin, W. 
D. Larkin, Donald Norman, and Elizabeth F. 
Shipley for the many discussions we have had 
of the ideas included in this paper. In addi- 
tion, Elizabeth F. Shipley has graciously al- 
lowed me to include portions of the data from 
her thesis, which shortly will be reported in 


questioned by some detection workers, 
e.g., Swets (1961), Swets, Tanner, & 
Birdsall (1961), and Tanner & Swets 
(1954a) who have argued that thresh- 
olds, if one still wishes to call them 
that, are introduced only at the 
central decision level itself. What is 
important in this view is that the 
value of the “response threshold”— 
usually it is called something else, 
such as a decision criterion or cutoff— 
is not a fixed feature of the organism, 
but rather it is a parameter under the 
control of the experimental instruc- 
tions, information feedback, payoffs, 
and other motivational factors. Two 
versions of such a threshold-free 
decision theory have been developed 
in detail. For signal detectability 
theory see Birdsall (1955), Green 
(1960), Licklider (1959), Peterson, 
Birdsall, and Fox (1954), Swets and 
Birdsall (1956), Swets, Tanner, and 
Birdsall (1955, 1961), Tanner (1955, 
1956), Tanner and Birdsall (1958), 
Tanner and Norman (1954), and 
Tanner and Swets (1954a, 1954b). 
For the choice theory see Luce (1959), 
Restle (1961), Shepard (1957), and 
Shipley (1960, 1961). A number of 
experiments have been reported which 
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agree with the main features of both 
theories. In addition to those re- 
ported in the above references, there 
are studies by Clarke, Birdsall, and 
Tanner (1959), Creelman (1959, 1960), 
Egan, Schulman, and Greenberg 
(1959), Green (1958), Green, Birdsall, 
and Tanner (1957), Shepard (1958), 
Swets (1959), Swets, Shipley, McKey, 
and Green (1959), and Veniar (1958, 
1958b, 1958c). Although the two 
theories differ conceptually, their pre- 
dictions are so similar that it has been 
impossible as yet to decide between 
them. 

In the course of evaluating signal 
detectability theory, a contrasting but 
equally explicit, sensory threshold 
model has been stated (Swets, 1961; 
Swets et al., 1961; Tanner & Swets, 
1954a). It postulates that the thresh- 
old is well above the noise level. 
There is no doubt that this model is 
inadequate, and it has been concluded 
that if thresholds exist they must be 
so far down in the noise that the 
notion of a threshold “. . . is not a 
workable concept . . . [and] for 
practical purposes, not measurable” 
(Swets et al., 1961, p. 336). At least 
two sets of behavioral data do not 
jibe easily with this view. 

First, there are studies, beginning 
with von Békésy (1930) and Stevens, 
Morgan, and Volkmann (1941), of the 
detection of energy increments of a 
pure tone background. Some of the 
results reported seem consistent only 
with a quantal (threshold) model. 
Although a number of people are 
dissatisfied with aspects of the experi- 
mental procedure and although the 
psychometric function has not always 
been found to be rectilinear as pre- 
dicted by some quantum theorists, the 
recurring n: (n — 1) relation between 
the probability one and zero intercepts 
of the psychometric function has not 
been accounted for in any satisfactory 


way by a continuous, threshold-free 
model. The only published attempt 
that I know of is by Barlow (1961), 
and his rationalization seems com- 
pletely ad hoc to me. 

Second, Shipley (1961) has obtained 
some simultaneous detection and rec- 
ognition data which indirectly sug- 
gest that detection thresholds exist. 
On each trial either a 1,000-cps tone 
in noise, a 500-cps tone in noise, or 
noise alone was presented, and the 
subject was required to decide whether 
or not a tone was present and, in- 
dependent of his detection response, 
to attempt to recognize which it was. 
(Controls were run in which no 
recognition response was required and 
in which recognition was only required 
when the subject said a tone was 
present; there did not seem to be any 
interaction between the forced rec- 
ognition responses and the detection 
responses.) If we separate the two 
detection responses, then we can ask 
how well he recognizes when he says 
he heard a tone as against when he 
said he did not hear one. If there 
really is a sensory threshold and if he 
reports no tone present only when the 
threshold is not exceeded, then there 
should not be any differential recogni- 


tion of the tones on the no-detect 


trials. This is what happens, as can 
be seen in Table 1, for both the Yes- 
No and forced-choice designs. 

This paper has two main purposes. 
First, a simple threshold model is 
described which appears to give as 
satisfactory an account of the response 
data as do the continuous detection 
theories. Second, a way is suggested 
to graft onto this sensory threshold 
model a decision process which pre- 
dicts in some detail the biasing effects 
of information feedback, payoffs, and 
presentation probabilities. It is note- 
worthy that in conjunction with the 
present threshold model the usually 
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TABLE 1 


| 


Design Presentation ' 


Forced- | 
choice 


500 cps 
1,000 cps 


Note.—These data from a simultaneous detection and recognition 
either a S00-cps tone in noire, a 1 


In the table, the 
tone 


to recognize which tone it was. 
presentation is recognized as 


alternative for 


” -choice experiment, ei 
intervals. 


assumed expected-payoff model is 
completely unacceptable. Instead, a 
learning process is postulated. Thus, 
the present psychophysical theory is, 
in part, an asymptotic learning theory, 
as seems sensible. 


Yrs-No DETECTION EXPERIMENTS 


It is generally agreed that one of 
the simpler detection experiments is 
the Yes-No design. On each trial un- 
ambiguous signals mark off a time 
interval during which either a back- 
ground or the background plus a 
stimulus? is presented, and the subject 
is required to indicate whether or not 
he thinks the stimulus is there. Often 
the possible responses are said to be 


In the signal detectability literature, the 
physical event to be detected by the subject 
has generally been called a “signal.” As 
long as one is working with tones in noise and 
the like, this does not seem inappropriate; 
however, nothing in this theory restricts one 
to signals in this sense, so I have elected to 
use the more general term “stimulus.” 


the are estimated separately for those triala when the sallert 
detected a tone (Yes columns) and for those when he said he did not 


quency but different energy. 
theless, I shall conventionally speak 
of the background as noise. 

Let n denote a typical presentation 
of noise, s a typical presentation of 
stimulus plus noise, and Y the Yes 
and N the No responses. The basic 
data are the relative frequencies of a 
Y response given s, #(Y|s), and of Y 
given n, P(Y |n), which are assumed 
to arise from and therefore to estimate 
the true conditional response prob- 
abilities p(Y|s) and p(Y|n). With 
or without “hats,” it is clear that 
p(N|s) = 1 — p(¥|s) and p(|n) 
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= 1 — p(¥{|m). Our problem is, first, 
to explain how these two conditional 
probabilities relate to one another 
when we vary such experimental 
parameters as the a priori probability 
P of presenting s (and so 1 — P of n), 
the physical magnitudes of s and n, 
and the payoffs. The proposed an- 
swers, although far from complete, 

mit some experimental evaluation 
of the model. Second, given estimates 
of parameters from Yes-No data, we 
must try to account for the data from 
other experimental designs involving 
the same s, n, and subject. 

We shall suppose that thresholds 
exist in the following sense. When 
either the noise alone or the stimulus 
plus noise is presented, the organism 
enters one of two hypothetical states 
denoted D and D. A “detection ob- 
servation” will be said to have oc- 
curred when he goes into State D and 
not to have occurred when he goes 
(or stays) in D. These states are 
assumed to be internal to the subject 
and therefore cannot be directly ob- 
served in terms of behavior. Whether 
they can be studied by physiological 
methods is an open question that we 
need not discuss here. We do not 
suppose that the same state neces- 
sarily results whenever a particular 
stimulus is presented, but rather that 
the state entered is determined by a 
random process that is characterized 
by fixed probabilities for a given 
subject, stimulus, noise, and experi- 
ment. Just where the variability 
enters in is not specified by the theory. 
The underlying conditional prob- 
ability model for these detection 
observations (not responses) is 


Presentation Presen- 


probability tation Observation 
D D 
P s a(s) 1-—q(s) 
1—P n q(n) 1—q(n) 


In words, q(n) is the true probability 
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that noise alone generates a detection 
observation, i.e., that it “passes” the 
threshold, and g(s), the true prob- 
ability that stimulus plus noise gener- 
ates a detection observation. We 
assume that g(s) 2 q(m). 

In the absence of data, one might 
have supposed that P(¥|s) = g(s) 
and P(Y|n) = q(#), but this cannot 
be because as a matter of fact the 
values of £(¥|s) and £(¥|m) depend 
upon at least P, the instructions, and 
the payoffs; and these differences are 
much too large and systematic to be 
ascribed to variability in the data. 
Evidently, then, the subject must 
convert some of the D observations 
into N responses or some of the D 
observations into Y responses, de- 
pending upon how he wishes to bias 
the outcome. On the assumption that 
the D observations are all indistin- 
guishable, or, at least, that the s and 
distributions of D observations are 
the same and that this is also true of 
the D observations, it is plausible that 
the bias involves responding “in- 
correctly” to some random fraction 


of the observations. If so, we obtain 
two different sets of equations depend- 
upon which bias is introduced : 
if p(Y|n) < q(n), then 

b(Y|s) = tq(s) 

plin) = tim), U1 
or if p(Y|n) > q(m), then 
e(¥|s) = as)+ult = 465)]_ py 


p(Y|n) = q(n) + uf — q(n)], 


where 0 < t, u <1. 

For the moment, we are not con- 
cerned about the actual values of the 
bias parameters ¢ and wu; rather we 
assume that any particular value can 
be made to arise, and we eliminate 
these unknowns from Equations 1 and 
2 to obtain the dependence of p(Y|s) 
upon p(Y|n) with g(s) and q(m) as 
parameters: 


OO ————————— 
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as) 
(22) prim, 


p(Y|s) = 


i= 1) . q(s) — g(n) 
( PIN) + 


1 — q(n) 


This equation describes a very simple 
function, namely, a straight line 
segment from (0, 0) to (¢(#), ¢(s)) and 
another from (q(#), ¢(s)) to (1, 1), the 
two portions of which we shall speak 
of as the lower and upper limbs, re- 
spectively. 

In the signal detectability literature 
the function relating p(Y{s) to p(¥|m) 
has been called a receiver operating 
characteristic or, more briefly, an 
ROC curve, but it seems more ap- 
propriate to call it an isosensitivity 
curve.‘ In that theory, it is truly a 
smooth curve. Examples of curves 
generated by detectability theory are 
shown in Figure 1 and of ones gener- 
ated by Equation 3, in Figures 2 and 3. 

The high threshold model discussed 
by Swets (1961), Swets, Tanner, and 
Birdsall (1961), and Tanner and 
Swets (1954a) is the special case of 
this one in which g(#) = 0, and so it 
consists only of the upper limb, i.e., of 
the line segment from (0, g(s)) on the 
ordinate to (1, 1). This is not a satis- 
factory summary of the data, but the 
two line segments of Equation 3 do 
about as well as any of the continuous 
theories with the same number of free 
Parameters, namely, two. For ex- 
ample Swets, Tanner, and Birdsall 
(1955, 1961) report data on visual 
brightness for four subjects, where the 
payoffs were varied and P was held 
at 1/2. And Tanner, Swets, and 
Green (1956) report acoustic data on 
the detection of a 1,000-cps tone in 
white noise for two subjects, where the 
(symmetric) payoffs were held fixed 


* I hope that the greater naturalness of this 
term, as compared with equisensitivity, will 

e adequate compensation for mixing Greek 
and Latin roots. 


if p(¥ |") S q(n). 
(3) 
if p( Yim) > q(x). 


and P was varied from 0.1 to 0.9 in 
steps of 0.2. In Figure 1, | have pre- 
sented the data and detectability 
curves for one subject from each 
experiment, choosing in each case the 
subject that most favors signal de- 
tectability theory. All of the data 
and the curves of the present thresh- 
old model are shown in Figure 2 for 
the visual experiment and in Figure 3 
for the acousticexperiment. Through- 
out the theoretical curves were fit by 
eye, because no optimal statistical 
procedure is known. (The theoretical 
crosses in Figure 3 will be discussed 
later.) 

In evaluating the acoustic data, 
two facts are important. First, the 
p(Y|s) coordinate of each data point 
is based upon a sample of 300 P, ob- 
servations and the p(¥|m) coordinate 
on 300 (1 — P) observations. Sec- 
ond, successive pairs of points reading 
around the curve, were generated 
under identical experimental condi- 
tions. Thus, there can be little doubt 
that there is variability beyond the 
binomial associated with each ob- 
servation point. 

I would judge that the visual data 
slightly favor the threshold model and 
the acoustic data, the detectability 
model. Although different modalities 
may well involve different processes, 
neither set of data seems particularly 
conclusive. One feature of both sets, 
however, casts suspicion upon the 
present threshold model. The model 
says that the isosensitivity curve has 
a sharp corner which all too often 
seems to float free of these data points. 
Of course, this is exactly what would 
happen were the true function a 
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p(Y|s) 
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Acoustic 
Subject 2 
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p(Y|n) 


Fic. 1. Yes-No detection data and the corresponding theoretical isosensitivity curves 
derived from signal detectability theory. (The visual brightness data, reported by Swets 
et al., 1955, 1961, were obtained under the same stimulating conditions with a presentation 
probability of 0.5, but with different payoff matrices. The detection of a tone in noise data, 
reported by Tanner et al., 1956, was obtained under the same stimulating conditions with a 
fixed symmetric payoff matrix, but with different presentation probabilities.) 


cornerless curve; hence, this threshold 
model does not deserve serious con- 
sideration unless the lonely corners 
are explained. A reason is suggested 
later. 


Two-ALTERNATIVE FORCED-CHOICE 
EXPERIMENTS 


In the two-alternative forced-choice 
design two time intervals are defined 
and the stimulus is, and is known to 
be, in exactly one. Thus, the two 
presentations are the ordered pairs 
(s, n) and (n, s). The subject responds 
by saying which interval, 1 or 2, he 
believes to have contained the stimu- 
lus. Assuming the above threshold 
formulation, there are four possible 
observations, (D, D), (D, D), (D, D) 
and (D, D), of which two, (D, D} and 
(D, D) give the subject no indication 
of which response to make. It seems 
plausible, at least when the payoffs 
are not too extreme, that the subject 
should apply biases only to these two 
ambiguous cases. Thus, we assume 
that he always responds 1 when 


(D, D) occurs, never when (D, D) 
occurs, some proportion v when (D, D) 
occurs, and another proportion w 
when (D, D) occurs. If we assume 
that the observation probabilities in 
the two intervals are independent, 
then the probability of, say, a (D, D) 
observation when (s, n) is presented is 
q(s)[1 — q(m)] because g(s) is the 
probability that the stimulus plus 
noise exceeds the threshold and 
1 — q(n) is the probability that noise 
alone fails to exceed it. The other 
cases are similar, and they lead to 


p(1|(s, n) 
= q(s)[1 — g(n)] + vq(s)a(n) 
+ wft —4() 111 aa] [4] 
pO |n, s)) 
= q(m)[1 — 4(s)] + vg (n)a(s) 
+ wft — aE — g(s)], 5] 
where 0 <v, w <1. 
It follows by subtraction that 
PALS, n)) = pA |ia, s)) 
+4(s) — a(n), [6] 
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Fic, 2. Yes-No visual brightness detection data from Swets, Tanner, and Birdsall (1955, 
1961) and the corresponding theoretical isosensitivity curves derived from a threshold theory. 
(Each coordinate of each point is based upon 200 observations.) 


Subject 1 Subject 2 


p(Y]s) 


Fic. 3. Yes-No acoustic (tone in noise) detection data from Tanner, Swets, and Green (1956) 
and the corresponding theoretical isosensitivity curves derived from a threshold theory. 
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so the isosensitivity curve is a 
line segment with slope 1 running 
from (q(n)(1—¢(s) J, a(s)[1—a(*)}) to 
(1—¢(s)[1—g(n)], 1—g(*) C1 —9(s)). 
Thus, for example, if q(s) = 0.9 and 
q(n) = 0.2, the segment runs from 
(0.02, 0.72) to (0.28, 0.98). 

It is also easy to sce from Equations 
4 and 5 that when v = w = q(n)q(s)/ 
ta(m)a(s) + C1 — aE — g(s) J}, 
then p(1{(s, n)) = q(s) and p(1|(n, s)) 
=q(n). That is, the two-alterna- 
tive forced-choice isosensitivity curve 
passes through the point whose co- 
ordinates are the true threshold 
possibilities. 

These last two remarks, coupled 
with the results about the Yes-No 
design, give a way to estimate the true 
threshold probabilities. Suppose for 
the moment that the several response 
probabilities are known. The point 
(q(n), q(s)) lies both on the 45° line 
passing through (p(1|(n, s)), P(1|(s, n))) 
and on a line passing through (p (Y |n), 
p(¥]|s)) and either (0, 0) or (1, 1), de- 
pending upon which limb of the Yes- 
No model is involved. Thus, the 
intersection of one of these two pairs 
of lines is the point (g(#), q(s)). The 
geometry is shown in Figure 4. 

So far as I know, no empirical 
isosensitivity curves have been pub- 
lished for the two-alternative forced- 
choice experiment, so we cannot check 
our prediction that it is a straight line 
with slope 1. This prediction differs 
considerably from the curve—which 
is also symmetric about the diagonal 
from (0,1) to (1,0)—predicted by 
signal detectability theory. 


Asymptotic LEARNING 


We turn next to the question of the 
values of the biasing parameters, £, u, 
v, and w. In the decision and choice 
theory models for these experiments, 
it has been customary to assume that 
the subject selects values for the 


biasing parameters so as to maximize 
his expected payoff. Let the payoff 
structure be 


Presentation Preses- 
probability tation 


Response 

Y N 
P s On Ors ] 
i-P " on on J, 


then if the subject is on the lower limb 
of the threshold model the expected 
payoff is 
E(o) = Pp(Y|s)ou 

+ P[1 — p(¥|s) Jou 

+ (1 — P)p(Y|n)on 

+ (1 — P) — p(Y|nJon 


= t[Pq(s) (0n — 012) 
+ (1 — P)q(n) (02 — 022)] 
+ Pow + (1 — P)on. 


Because this equation is linear in £, the 
maximum occurs either at ¢ = 0 or 

=1. A similar calculation for the 
upper limb yields either u = 0 or 
u =1. Thus, the expected payoff 
model places the subjects at one of 
three points: (0,0), (q(), q(s)), or 
(1,1). This is clearly wrong (see 
Figures 2 and 3). 

Whether this prediction is wrong 
because of the threshold model or be- 
cause of the expected payoff model is 
less easy to decide. One thing about 
the expected payoff model should be 
noted: knowledge of the two subject- 
determined conditional probabilities 
p(¥|s) and p(Y|n) is needed to 
calculate the values of the parameters. 
Certainly no one will claim that the 
subject “knows” these, even un- 
consciously, in a way that he can 
actually calculate expected values; 
more likely, he arrives at his biases by 
a process of adjusting to his experience 
—by learning. It is curious that no 
one has yet evolved a learning theory 
which, asymptotically, predicts the 
maximization of expected values. An 
alternative, and to my mind more 
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ain) 
LOWER LIMB 
Fic. 4. The geometry relating the Yes-No isosensitivity curves, the two-alternative 


forced-choice isosensitivity curves, a: 


reasonable, tack is to postulate di- 
rectly a learning process, preferably 
one that has already achieved some 
success in other areas, and to test its 
asymptotic predictions against be- 
havior. This we do.* 

Consider a subject who is operating 
on the lower limb of the Yes-No iso- 
Sensitivity curve, and suppose that on 


* Conceptually, there is no special affinity 
between learning and thresholds, but in 
Practice there are good reasons why it is 
easier to graft a learning mechanism on this 
threshold model than on the signal detect- 
ability model. In both there are three classes 
of events that a subject might use to control 
his learning: the hypothetical internal ob- 
servations, his responses, and what he learns 
the presentation to have been. Both the first 
and third of these events form statistically 
stationary processes over trials, whereas the 
response probabilities are changing. Thus, 
if the learning process is dependent upon the 
Tesponses, the resulting stochastic learning 
model is mathematically quite complex and I 
do not know how to analyze it. Although the 
other two classes of events do not have this 
Particular complexity, the first can introduce 
a different kind. The internal observations 
that are assumed to occur in the detectability 
model take on values in a continuum, and so 
the learning model for this case must be 
continuous, and such models are not yet 
very well understood. The threshold model 
has the distinct advantage that there are only 
a small number of observations states, which 
results in a mathematically simple learning 
Process, 
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nd the true threshold probabilities. 
Trial ș the bias is £. What is the bias 


tin, on Trialé + 1? Of the events oc- 
curring in Trial i, the only two that the 
subject should rationally take into 
account in modifying the bias are his 
observation, D or D, and what he 
later learned the presentation to have 
been, s or n. If he is rational, he 
certainly should not let the response 
he made on Trial ¢ or, for that matter, 
on any of the preceding trials influence 
his choice of bias. Because on the 
lower limb the bias only tells him 
how often to respond N to D observa- 
tions, it seems clear that he should not 
change it when a D observation oc- 
curs. When a D observation results 
from an s presentation, the bias cer- 
tainly should not be lowered, which 
would only decrease the Y responses, 
and it should not be increased when a 
D observation results from an n 
presentation. That is, we expect 


4 D) 
,D 
(Dy 
(n, D) ). 


The exact nature of the transition is 
not obvious; however, a linear oper- 
ator (Bush & Mosteller, 1955) is 
certainly one of the simplest possi- 
bilities and one that has received 


bisa 
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considerable attention. So we assume‘ 


(1—6)t;+0 (s, D} 
hee Ce if Ep [7] 
li (n, D)J. 
Then 
E (talt) =Pp(D|s) [A —8)t:+-0] 
+Pp(D|s)t; 


+(1—P)p(D|n)(1—6')ts 
+(1—P)p(D|n)t; 
=t;{Pq(s) (1-8) 
+P[1—q(s)] 
+(1—P)q(m) (1-8) 
+(1—P)[1—¢(m) ]} 
+Pq(s)6. 
Taking expectations over ft; and 


then the limit as ¿—> © yields as the 
asymptotic expected bias 


qls) 
q(s) + bg(n)’ 


= (2S*)\(2). m 


Similarly, we postulate the follow- 
ing learning process for the upper 
limb: 


te = lim E(f) = 


i>a 


[8] 


where 


tli (s, D) 
Ui -- |(n, D) 

Winn (1—6)1;-+0 if ls, D) J [10] 
et —8')u; (n, D) 


and a parallel calculation gives 


Ua = lim E(u;) 


inn 


Zi 1 — q(s) 
1 — g(s) + [1 — g(a) Jb pei 


According to Equation 9, the quan- 


6 Although I will state the operators in 
terms of the bias parameters, they could 
equally well be stated in terms of the response 
probabilities because these probabilities are 
linear functions of the bias parameters. 


tity b depends upon P and upon the 
two learning rate parameters @ and 6’, 
which presumably in turn depend 
upon the payoffs. We have no theory 
for this dependence, so in general b 
will have to be estimated from the 
data or, as when we assume the learn- 
ing rates to be equal for symmetric 
payoffs, an assumption will have to 
be made about 0 and @’. Clearly, if 
6+0 and 6 +0, b ranges from 0 
when P = 1 to ~ when P=0. At 
some point when P is varied the 
subject presumably changes from 
operating upon the upper limb to the 
lower limb. (See Figure 3 where P 
varies from 0.1 to 0.9.) We do not 
have a general theory for when this 
change occurs, but it seems plausible 
that it should be somewhere in the 
middle range of P values. If so, then 
ls is bounded away from 0 to the 
extent that g(s) is less than 1 and t» 
is bounded away from 1 to the extent 
that q(n) is greater than 0. Or 
translated back to the isosensitivity 
curve, the upper limb data points are 
prevented from being near the corner 
of the curve to the extent that q(s) is 
less than 1 and the lower limb points 
are prevented from being near it to the 
extent that g(m) is greater than 0. 
An examination of Figures 2 and 3 
suggests that the data are consistent 
with this statement, which may ex- 
plain why the corners seem to be 
isolated. It also suggests that in- 
formation feedback need not always 
be beneficial in inducing subjects to 
yield up the desired information, in 
this case, the true threshold prob- 
abilities, as is often assumed by 
modern psychophysicists. 

For the two-alternative forced- 
choice model, we assume essentially 
the same learning process, namely 


(1—6)v;+8, if (s, n) and (D, D) 
Vigi=)(1—6')v;, if (n, s) and (D, D) 
Vi, otherwise. 
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A calculation similar to that for the 
Yes-No experiment yields 


va = 1/(1 + b). [12] 


Similarly, 


(1—0)w:+9, if (s, n) and (D, D) 
Wi = ih: —6')w;, if (n, s}and (D, D) 
Wi, 


otherwise 


yields 
We. = 1/(1 +b). [13] 


We note that vs» = Ws, as seems a 
priori reasonable, and that neither 
bias depends upon the underlying 
probabilities q(n) and g(s) as in the 
Yes-No experiment. We also note 
that if P = 1/2 and if the learning 
rates are equal, then the biases 
are symmetric in the sense that 
Vo, = Wo = 1/2; 


EMPIRICAL TESTS 


To test the model, we have four sets 
of data, all collected on W. P. Tanner’s 
equipment in the Psychophysical Lab- 
oratory, Electronic Defense Group, 
University of Michigan. Shipley 
(1961) ran each of three subjects in, 
among other conditions, the Yes-No 
and two-alternative forced-choice de- 
signs with P = 0.5 and with sym- 
metric payoffs. Each condition was 
run twice with different stimuli, pure 
tones of 500 and 1000 cps. Each pres- 
entation, s or n in the Yes-No and 
(s,n) or (n,s) in the forced-choice 
design, occurred 800 times. Using the 
estimation scheme of Figure 4, values 
for q(n) and q(s) were obtained for 
both limbs. If either or both inter- 
sections lay outside the unit square, 
I selected the intersection of the 45° 
line through the forced-choice data 
point and the edge of the unit square 
as the final estimate. This incorrectly 
attributes all of the error variance to 
the Yes-No data point; however, 
because of the location of the two 
points, the forced-choice point un- 


doubtedly has somewhat less binomial 
variance. Moreover, the learning 
process itself introduces added vari- 
ance which more seriously affects our 
estimate of the Yes-No point than 
of the forced-choice one. Once q(n) 
and gq(s) are estimated, then the 
theoretical location of the data points 
on the isosensitivity curves is de- 
termined by Equations 1 and 8 or 2 
and 11 for the Yes-No experiment 
and by Equations 4, 5, 12, and 13 for 
the forced-choice experiment, pro- 
vided that we know b. If we assume 
equal learning rates, then b = 1, The 
comparison between data and theory 
under that assumption is shown in 
Figure 5; it is surprisingly good, but 
unfortunately it does not permit us to 
decide which limb is being used. 
There is some suggestion that it may 
be the lower one, for in four of the 
six cases that and only that inter- 
section is in the unit square, but this is 
far from conclusive. 

Swets (1959) reported similar data 
on three subjects for several different 
signal to noise ratios. The plots are 
similar to those for Shipley’s data; 
however, the predictions do not seem 
to be quite so accurate. In part this 
is due to the smaller sample sizes used 
by Swets. 

Next, we have the acoustic data 
from Tanner, Swets, and Green (1956) 
which were presented in Figure 3. 
Again, because the payoffs were sym- 
metric, we assume equal learning 
parameters, so b is determined by P. 
The predicted values, assuming that 
the P = 0.1 and 0.3 points are on the 
lower limb and that the rest are on the 
upper one are shown as crosses in 
Figure 3; recall that successive pairs of 
data points were collected under iden- 
tical experimental conditions. The 
predictions seem satisfactory for Sub- 
ject 1, but less so for Subject 2. 
Because no study has yet been made 
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by Shipley (1961). 


Yes-No and two-alternative forced-choice acoustic (tone in noise) data reported 
(The theoretical curves are from the threshold model and the predicted 


values—crosses—are asymptotic values derived from a linear learning process.) 


of the learning process itself, I do not 
know how adequate the assumption 
= 0' is, but to the extent that it is 
wrong errors are introduced into our 
predictions. 
Comparable 


predictions for the 


visual data in Figure 2 are less easy 
to make because the isosensitivity 
curve was generated by varying the 
payoffs, not P. The only information 
that we have about the payoffs used 
are the numbers 


— ad 
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( *\(2 PS ox) 

ke Ou — O12)’ 
which are the relevant criterion quan- 
tities if one assumes that the subjects 
maximize expected payoffs and that 
they are described by the signal 
detectability model. If we assume 
that the learning rate parameter asso- 
ciated with s presentations, @, is 
proportional to the difference of the 
two s payoffs, 011 — 012, and that 6’ is 
proportional to 022 — 021, then 


1—P)\@ 
+= (152)f = xs 
Thus, in addition to g(m) and q(s), 
there is the free parameter K to be 
chosen when fitting data. The values 
for q (n) and q(s) we get from Figure 2. 
Because it is reasonable that the two 
constants of proportionality relating 
@ and 6’ to payoffs might be the same, 


I first tried using K = 1 to predict the 
responses. For three of the subjects 
this seemed satisfactory, but by trial 
and error I found that K = 0.5 is a 
much better choice for Subject 1. The 
results are shown in Table 2. Note 
the rather sharp break in both the 
observed and predicted values of 
Pa( Y |n) as one moves from the lower 
to the upper limb, as indicated by 
the bold face vertical bars in the table, 
even though the changes in ĝ are 
small in that region. 


k-ALTERNATIVE FORCED-CHOICE 
EXPERIMENTS 

The two-alternative forced-choice 
design can be readily generalized to 
one having & intervals, exactly one of 
which contains the stimulus. It is not 
easy to work out the response prob- 
abilities for any model, including this 
one, except under the assumption that 


TABLE 2 
ASYMPTOTIC LEARNING MODEL PREDICTIONS OF Visual Data 
| 8 
Sub- a(n) | ats) | K 2 = 
8|6|4 2s| 2 [15 1 |.75|.75 | -50| .25| .16 
p(Y|s) observed 64| 64| 82| 79| —| 63| 85| 77| 89| 84| 88| 91| 96 
1 predicted 64| 64| 67| 72| —| 76] 84| 86| 87| 87| 89| 93| 95 
osi psom p(Y|n) observed 2| 3| 2| 6| —| 5| 20| 32| 19) 29| 50| 65| 78 
predicted 4| 4l 4| 4| —| 5] 26] 33| 39| 39| 48| 65) 74 
p(Y|s) observed 48| 59| 73| 75| 71] 80| 90| 84| 87| 89| 95| 93| 95 
predicted 58| 58| 63| 69| 74| 86| 86| 87| 88| 88| 89| 91| 93 
© RE p(Y|n) observed ol ol 2| 3| 4{ 11) 19} 23| 18) 25) 25) 41) 66 
predicted 3} 3| 4| 4| 4] 12] 14| 18) 22) 22) 28) 42) 52 
pi p(Y|s) observed 45| 56| —| 64 73| 65| 78| 90| 84| 87| 88| 85| 94 
predicted 64| 64| —| 70| 76| 77| 85| 86| 86| 86| 87| 90| 92 
a REN p(Y|n) observed ol 2| —| 2| 8| 1) 15} 18| 14| 24) 12) 33| 59 
predicted 2| 2| —| 3| 3| 3} 13| 18| 21| 21| 28) 43) 54 
p(Y|s) observed 43| 43| 64| 64/56] 71| 76 77| 84| 86| 83| 87| 93 
predicted 42| 42| 47| 53|,60] 77| 78| 79| 81| 81| 83| 84| 91 
4.07 /.74} 1] seyin) observed | O| 4| 6] 12} 517| 14| 18] 34| 34) 35| 49| 74 
predicted 4| 4| 4| 5| 6) 18) 22| 27| 32| 32| 40| 42| 66 


Note.— i edicted by asymptotic learning model for visual detection data reported by 
Swets, Tanner tae Birdsall 1985, 1961). The vertical bold face line indicates the transition from lower to upper 
limb. ’ Each observed proportion is estimated from 200 observations. Decimal points have been systematically 
omitted, 
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Fic. 6. Maximum and minimum curves 
of the proportion of correct responses in 
k-alternative forced-choice designs. (The 
data points for the detection of a tone in 
noise are from Swets, 1959.) 


the asymptotic response biases are 
equal. In the two-alternative case, 
this means setting ve = 1/2, which is 
what happens in the learning model 
if the learning rates are equal and 
P = 1/2. Ingeneral, it means that in 
any ambiguous situation the several 
possibilities are used equally often. 
The effect of this symmetry assump- 
tion is to make the probability of a 
correct response independent of the 
stimulus presentation. Correct re- 
responses can occur in the following 
ways: A response is always correct 
when the stimulus produces a D 
observation and the k — 1 noise pre- 
sentation produce D observations; it 
is correct one-half the time when s 
and exactly one n produce D observa- 
k- ') 
1 
ways; it is correct one-third of the 
time when s and exactly two n’s pro- 
duce D observations; etc.; and it 


tions, which can happen in 


is correct one-kth of the time when 
all intervals produce D observations. 
These are the only ways a correct 
response can occur, so 


m7 1 \(k-1 
non E( 

Xa(s)a (n) C —g (n) Fo 

H-O- 


1 je—1 
“ia O —q(n)]} 


X[a(s)-g@) J}. [14] 
For k = 2, p2(C) = (1 + A)/2, where 
A = q(s) — q(n), whereas for k > 2, 
fx(C) depends upon both g(m) and 
q(s) and not just upon their difference. 
Thus, in contrast to other theories, 
px (C) is not uniquely determined by 
p2(C). To get an idea of the freedom 
involved, assume A is fixed, then the 
limiting possibilities are when g() = 0 
and g(s) = A, in which case 
P(C) = [A (k — 1) + 1]/k, [15] 
and when q(n) = 1 — A and q(s) = 1, 
in which case 


oo =;(F=4). 0161 


Typical examples of these bounds are 
shown in Figure 6. The data points 
are from Swets (1959); clearly they 
fall within the bounds. 

Swets (1959) ran three other sub- 
jects in the Yes-No and the two- and 
four-alternative forced-choice designs. 
If we estimate the values of g(s) and 
q(n) from the Yes-No and two- 
alternative forced-choice data using 
the method of Figure 4, then we can 
predict what should be observed in the 
four-alternative forced-choice experi- 
ment.’ Because it is not always clear 
from the Yes-No data which limb was 


7] wish to thank J. A. Swets for providing 
me with the raw data to make these calcu- 
lations. 


% 


predicted and observed 
mental condition. 
used, the calculations are reported for 
both limbs in Table 3. These predic- 
tions suggest that Subject 1 was 
operating on the lower limb; that 
Subject 2 was on the upper limb for 
at least the three most intense stimuli; 
and that the picture is not clear for 
Subject 3. It is certainly the case 
that one of the two predictions is 
always near the observed value. 
Tanner, Swets, and Green (1956) 
report four-alternative forced-choice 
data for the same subjects whose Yes- 
_ No data are shown in Figure 3. Esti- 
mating q(n) = 0.11 and g(s) = 0.68 
from the Yes-No data for Subject 1, 
= we predict py(C) = 0.63; 0.60 was ob- 
served. For Subject 2, q(m) = 0.28, 
q(s) = 0.74, and we predict pa(C) 
= = 0.51; 0.56 was observed. In both 
cases, p4(C) was estimated from 297 
_ observations. 


_ Distortion OF THE PSYCHOMETRIC 
FUNCTION 


A plot of the Yes-No detection 
probability versus a physical measure 
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TABLE 3 


FOUR-ALTERATIVE FORCHO-CHOICR EXPERIMENT 


SSS — or z ser 


Note.—Estimates of q(n) and g(s) from Swets (1959) Yes-No and i 
mat d as) and eta) u pae two-alternative forced-choice data, amd the 


tions at each signal level in each experi- 


Decimal points have been omitted on all the probabilities. 


of the stimulus magnitude is usually 
called a psychometric function. For 
example, in the von Békésy-Stevens 
quantal theory, the theoretical func- 
tion is 0 for all stimulus increments 
less than one amount, 1 for all incre- 
ments larger than another, and a 
straight line between these two points. 
As no distinction has been made in the 
quantal literature between what we 
are calling q(s) and p.(¥|s), it is not 
perfectly clear which function is 
meant. There is no question that 
in testing the theory, estimates of 
pe(Y|s) have been plotted against 
increment size, but an examination of 
the theory itself suggests that we 
should interpret it as referring to q(s). 

Assuming that the above learning 
model for biasing is correct, the 
relation between p..(¥|s) and q(s) for 
the lower limb bias is obtained from 
Equations 1 and 8; it is 

als)? 


Because bg(7) > 0, pe(V|s) < q(s) 
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on the lower limb, and its maximum 
value, 1/[1 + bg(m)], occurs when 
q(s) = 1. Similarly, Equations 2 and 
11 yield the result for the upper limb: 


pe(Y|s) = g(s) + ueL1 — g(s)] 
[1 — g(s) ; 
— q(s) + [1 — q (n) 6 


In this case p(Y|s) > q(s), and its 
minimum value, q(n) + [1 — q(n)]/ 
(1 + b), occurs when g(s) = q(m). If 
we suppose that g(s) is a rectilinear 
function having p=1 and p=0 
intercepts in 2:1 ratio and that 
q(n) = 0.05, then we get plots like 
those shown in Figure 7, where b is a 
parameter. 

Once again, we are not sure when 
the subject switches from lower to 
upper limb biasing. It is clear that 
such a switch must occur, for when he 
is on the lower limb pa (Y |s) can never 
reach 1, no matter how intense the 
stimulus is. 


= 4s) +7 


1.0 


p(Yls) 


QUANTAL UNITS 


Fic. 7. Theoretical upper and lower limb 
psychometric functions when q(s) is assumed 
to be rectilinear with a 2:1 ratio of intercepts, 
q(n) = 0.05 and b = 1/10, 1, and 10. (The 
upper limb curves are above, and the lower 
limb ones are below the g(s) curve, which is 
indistinguishable from the lower limb 
b = 1/10 curve.) 


The following hypothesis is cur- 
rently under investigation and it ap- 
pears to have some merit. In neural 
quantum theory (Stevens et al., 1941), 
those stimulus increments that cause 
zero and one quantum changes are 
assumed not to be detected, whereas 
those that cause changes of two or 
more quanta are. Let us suppose that 
this defines our states D and D, re- 
spectively. In addition to this as- 
sumption, let us postulate that the 
subject also uses the change in the 
number of quanta excited to decide 
which bias to use. Specificially, let 
us suppose that there is an integer 
h > 0, such that if the stimulus pro- 
duces a change of fewer than / quanta, 
he imposes a lower limb bias, and that 
if it produces a change of h or more, 
he imposes an upper limb bias. We 
do not know what determines the 
choice of k, but presumably it de- 
pends in part upon instructions, pres- 
entation probabilities, and payoffs. 
In any event, we can see what sorts of 
psychometric functions result from 
different choices for h. 

For h = 0, the subject always uses 
an upper limb bias; these are the 
upper functions shown in Figure 7. 
For all other values of h, there are re- 
gions of stimulation where either k — 1 
or new quanta are excited by the 
stimulus, and so the data will be a 
weighted average of the response 
curves resulting from lower and upper 
limb biases. The probability that h 
additional quanta are excited depends 
upon the probability that the energy 
residue of the background plus the 
stimulus exceed k quantal units of 
energy. In quantum theory it is 
usually assumed that the distribution 
of residues is (approximately) uni- 
form, in which case we simply have 
the following rule: Let s denote the 
value of the stimulus in quantal units, 
if s <h — 1, then there is a lower 
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hei 


P(YIs) 


h=3 


QUANTAL UNITS 


__ dic. 8 Theoretical p.(Y|s) psychometric functions for different values of h when g(s) 
is assumed to be rectilinear with a 2:1 ratio of intercepts, q(") = 0.05, and 6 = 1/10, 1, and 10, 
(For h = 3 and 4, the b = 1/10 curve is indistinguishable from the g(s) curve.) 


limb bias; if h —-1<s <h, then 
there is a lower limb bias with prob- 
ability k — s and an upper limb one 
with probability 1 — (h — s); and for 
$ >h, there is an upper limb bias. 
Using this rule, typical functions are 
shown for h = 1, 2, 3, and 4 in Figure 
8. For h larger than 4, the functions 
are just like those for h = 4, except 
that the right hand plateau extends 
over h — 3 quanta units. 


CONCLUSION 


_ The central conclusion of this paper 
is that there is at least one sensory 
threshold model for simple detection 
experiments which is not clearly wrong 
as judged by existing data. Four 


features of the model are noteworthy, 
of which two are really problems. 
First, the biasing effects on the re- 
sponse behavior that result from pay- 
offs and presentation probabilities 
were treated as the asymptotic conse- 
quences of a linear learning process, 
not as the usually assumed maximiza- 
tion of expected payoffs which, when 
coupled with this threshold model, 
yields totally incorrect results. Sec- 
ond, the dependence of the asymptotic 
response probabilities on the prob- 
abilities of stimulus presentations is 
explicit, but the dependence upon the 
payoffs is given only in terms of 
learning rate parameters and so is 
implicit. A theory relating the learn- 
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ing parameters to payoffs is needed, 
but in the meantime sequential data 
should be collected and the parameters 
estimated directly as in ordinary 
applications of stochastic learning 
theory. Third, the threshold analysis 
of the Yes-No detection experiment 
led to an isosensitivity curve that 
consists of two distinct line segments, 
but no criterion was developed about 
which limb is in use during a par- 
ticular experimental run. This leads 
to an ambiguity in the estimates of 
the true threshold probabilities, which 
continually proved to be a bothersome 
problem in analyzing data and eval- 
uating the model. The data in 
Figure 3 and in Table 2 hint at the 
possibility that subjects may shift 
between the limbs when 


B i ( — P(e = 2) 

E 011 — O12, 
is in the neighborhood of 1.5 or 2, 
suggesting that it may be better to 
choose presentation probabilities and 
payoffs so that £ lies well outside this 
transition region. In this way, we can 
be reasonably sure whether the sub- 
ject is using a lower or upper limb bias. 
For example, if the payoff matrix is 
symmetric, one might use Ps in the 
neighborhood of 0.2 or 0.8, which 
places £ in the neighborhood of 4 or }, 
respectively. Finally, the theoretical 
effects of this sort of biasing on the 
psychometric function were shown to 
be quite striking (see Figure 8). If 
this, or some such, model is correct, 
then fairly subtle analyses of response 
data are necessary in order to test 
adequately any theory of the psycho- 
metric function, such as the neural 
quantum theory. 
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ATTENTION: 
SOME THEORETICAL CONSIDERATIONS * 


J. A. DEUTSCH axo D. DEUTSCH 
Stanford University 


The selection of wanted from unwanted messages requires discrimina- 
tory mechanisms of as great a complexity as those in normal percep- 
tion, as is indicated by behavioral evidence. The results of neuro- 
physiology experiments on selective attention are compatible with this 
supposition. 


This presents a difficulty for Filter theory. 


Another 


mechanism is proposed, which assumes the existence of a shifting ref- 
erence standard, which takes up the level of the most important arriv- 
ing signal. The way such importance is determined in the system is 
further described. Neurophysiological evidence relative to this postu- 


lation is discussed. 


There has, in the last few years, 
been an increase in the amount of 
research devoted to the problem of 
attention, which has been summarized 
in Broadbent's (1958) important 
work. Whilst psychologists have 
been investigating the behavioral as- 
pects of attention, suggestive evidence 
has also been found by neurophysiolo- 
gists. We feel that it would be useful 
at this time to consider the theoretical 
implications of some of this research. 

Our paper is divided into three 
parts. In the first we consider some 
of the behavioral findings on attention. 
In the second a system is proposed 
to account for various features of this 
behavior. Although we do not con- 
sider it necessary to identify a system 
of this type with particular neural 
structures (see Deutsch, 1960) since 
a machine embodying such a system 
would also display the behavior we 
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wish to explain, we do, however, ven- 
ture some tentative hypotheses con- 
cerning the neural identification of the 


proposed system. 
BEHAVORIAL CONSIDERATIONS 


However alert or responsive we 
may be, there is a limit to the number 
of things to which we can attend at 
any one time. We cannot, for in- 
stance, listen effectively to the con- 
versation of a friend on the telephone 
if someone else in the room is simul- 
taneously giving us complex instruc- 
tions as to what to say to him. And 
this difficulty in processing informa- 
tion from two different sources at the 
same time occurs even if no overt 
response is required. This phenome- 
non of selective attention has been in- 
vestigated in a number of experiments. 
The most important of these deals 
with the processing of information 
emitted simultaneously by two sepa- 
rate sound sources (Broadbent, 1954; 
Cherry, 1953; Spieth, Curtis, & Web- 
ster, 1954). Two problems arise 
from the results of such experiments. 
The first is how different streams of 
information are kept distinct by the 
nervous system, and how a resultant 


‘babe! is thereby avoided. The second 
‘is why only one of the messages (once 
it has been kept distinct and separate) 
is dealt with at any one time, A 
om solution to the first problem, 

l on experiments in which two 
messages were fed simultaneously one 
to each ear, was that the messages were 
kept distinct by proceeding down sepa- 
fate channels (such as different neu- 
ral pathways). Nor was it difficult 
for Broadbent (1958) to extend such 
‘A notion to other cases. It had been 
shown in numerous i that 
we are enabled to listen to one of two 
_ Simultaneous speech sequences while 
_ ignoring the other, by selecting items 
for attention which have some fea- 
ture or features in common, such as 
their frequency spectra (Egan, Car- 
| terette, & Thwing, 1954; Spieth et 
al., 1954) and their spatial localiza- 
tion (Hirsch, 1950; Poulton, 1953; 
Webster & Thomson, 1954). It was 
Supposed that relatively simple mecha- 
nisms were responsible for segregation 
"according to these categories, though 
the principles of their operation were 
not made clear. 

Broadbent's (1958) answer to the 
second problem, of how one message 
is admitted to the exclusion of others, 
followed from the notions we have 
already considered. It was proposed 
that there was a filter which would 
Select a message on the basis of charac- 
teristics toward which it had been bi- 
= ased and allow this message alone to 
_ proceed to the central analyzing mech- 

anisms. In this way, messages with 
other characteristics would be ex- 
_ cluded and so the total amount of dis- 
crimination which would have to be 
= performed by the nervous system 
would be greatly decreased. Whole 

complex messages could be rejected on 
the sole basis of possessing some sim- 
_ ple quality, and no further analysis of 
them would occur. 
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to be responsible. Sharpless and 
Jasper (1956), studying habituation 
to auditory stimuli in cats, found that 
habituation, both behavioral and EEG, 
was specific not only to the frequency 
of sound presented, but also to the 
pattern in which a combination of fre- 
quencies was presented. Evidence for 
human subjects is presented by Soko- 
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lov (1960) and Voronin and Sokolov 
(1960), who report that when habitu- 
ation has been established to a group 
of words similar in meaning but dif- 
ferent in sound, then arousal occurred 
to words with a different meaning. 
Behavioral data on the arousal of 
curiosity in rats upon the presentation 
of novel visual patterns are reported 
by Thomson and Solomon (1954). 
Such evidence as the above would 
require us, on filter theory, to postu- 
late an additional discriminative sys- 
tem below or at the level of the filter, 
perhaps as complex as that of the 
central mechanism, to which informa- 
tion was assumed to be filtered. 
Howarth and Ellis (1961) have 
presented an ingenious experimental 
argument to show that the same dis- 
criminatory mechanism functions in 
normal perception and when, on filter 
theory, the discrimination would have 
to be performed at the level of the 
filter. The case they put forward is 
as follows. Moray (1959) had shown 
that if a subject is listening selectively 
to one channel and ignoring the other, 
calling his name on the rejected chan- 
nel will on a certain proportion of 
instances cause him to switch his at- 
tention to this channel, This was 
explained by assuming that the sub- 
ject’s name had a higher priority for 
the filter than the message to which 
he had been attending. Oswald, Tay- 
lor, and Treisman (1960) in a well- 
controlled experiment reported that 
during sleep a subject tends to respond 
selectively to his own name. Howarth 
and Ellis (1961) went on to show 
that the subject’s name has a signifi- 
cantly lower threshold than other 
names when the subject is required 
to listen normally and there is mask- 
ing by noise. After analyzing quanti- 
tatively their results and those ob- 
tained by Oswald et al. and Moray, 


they (Howarth & Ellis, 1961) con- 
clude that, 


There is, therefore, a very impressive 
amount of agreement among these three 
very different experiments concerning the 
relative intelligibility of one’s own name. 
It seems an obvious conclusion to suppose 
that the same pattern-analyzing mechanism 
is required to account for behavior during 
dichotic listening or during sleep 


as during ordinary listening under 
noise. Thus although Broadbent's 
(1958) filter theory provides an in- 
genious explanation of the selection of 
messages by means of simple and few 
discriminations, such as which ear is 
being stimulated, it becomes less at- 
tractive as an explanation of those 
cases where complex and many dis- 
criminations, discussed above, are 
needed. 

If we may identify levels in filter 
theory with neural levels, then there 
is also evidence against a two-level 
system to account for novelty and 
habituation on neurological grounds. 
Sharpless and Jasper (1956) found 
that specificity of habituation to tonal 
pattern was destroyed by bilateral re- 
gions of cortex concerned with audi- 
tion. It is known from other work 
(Goldberg, Diamond, & Neff, 1958) 
that sound pattern discrimination is a 
cortical function. On the other hand, 
frequency specific habituation was 
maintained with Sharpless and Jasper’s 
lesions and it has been shown that 
frequency discrimination can be taught 
to animals without these cortical areas 
(Goldberg et al., 1958). This shows, 
first, that the level at which habitua- 
tion occurs is not the same for both 
pattern and tone, and second, that the 
destruction of the level which is es- 
sential to normal functioning also de- 
stroys an animal’s ability to habituate. 
This renders it plausible to assume 
that the mechanism responsible for ha- 
bituation is not on a different level 


from that responsible for other learn- 
ing and discrimination. 


THEORETICAL CONSIDERATIONS 


This review of the behavioral evi- 
dence leads us to the probable conclu- 
sion that a message will reach the 
_ same perceptual and discriminatory 
mechanisms whether attention is paid 
to it or not; and such information is 
then grouped or segregated by these 
mechanisms. How such grouping or 
segregation takes place is a problem 
for perceptual theory and will not con- 
cern us here. We may suppose that 
each central structure which is excited 
by the presentation of a specific quality 
or attribute to the senses, is given a 
preset weighting of importance. The 
central structure or classifying mecha- 
nism with the highest weighting will 
transfer this weighting to the other 
classifying mechanisms with which it 
has been grouped or segregated. 
The main point with which we are 
concerned is the following. Given 
that there is activity in a number of 
structures, each with a preset weight- 
ing of importance, how might that 
group of structures with the greatest 
weighting be selected? Or, in behav- 
ioral terms, how might the most im- 
portant of a group of signals be se- 
lected? Any system which performs 
such a function must compare all the 
_ incoming signals in importance. This 
could be done by comparing each in- 
coming signal continuously to every 
other incoming signal and deciding 
which is the most important by seeing 
which signal has no other signal which 
exceeds it in the physical dimension 
_ by which “importance” is represented. 
But a small amount of reflection will 
suffice to show that such a system is 
very uneconomical. Each possible in- 
coming signal must have a provision 
in the shape of numerous comparing 
mechanisms, through each of which it 
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will be connected to all other possible 
signals. So that as the number of pos- 
sible signals increases, the number of 
mechanisms to compare them all 
against each other will increase at an 
enormous rate. If the sume comparing 
mechanisms are to be shared by pairs 
of signals then the time to reach a de- 
cision will increase out of all bounds, 

However, there is a simpler and 
more economical way to decide that 
one out of a group of entities is the 
largest. Suppose we collect a group of 
boys and we wish to decide which is 
the tallest. We can measure them in- 
dividually against each other and then 
select the boy in whom this compari- 
son procedure never yielded the an- 
swer “smaller.” This is like the sys- 
tem outlined above. The decision 
smaller will be made in this case when 
we lower a horizontal plane or ruler 
down on the heads of two boys. The 
boy whose head is touched by this 
instrument is declared to be larger 
and the other boy smaller, But such 
a procedure is cumbersome because 
there are many pairs of boys and we 
must scan through many records of 
individual boys before we can select 
the tallest. We could, of course, argue 
that a simpler solution would be to use 
an absolute measure of height, such 
as a ruler with feet and inches in- 
scribed on it. But this procedure is 
not really simpler, -Each boy must 
be compared against the ruler, and 
then the measurements themselves 
must be compared against each other 
in much the same way as the boys 
were to decide on the larger and 
smaller in each couple. 

If we are simply interested in find- 
ing the tallest boy, then an alternative 
procedure may be used. Suppose we 
collect our group below our board 
which is horizontal and travels lightly 
up and down, and then ask all our 
group to stand up below it. Then the 
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boy whose head touches the board 
when the whole group is standing up 
will be the tallest boy in the group. 
If then we call him out, the board 
will sink until it meets the head of the 
next tallest individual. If we intro- 
duce some other boys into the group, 
then if there is a taller boy in this 
group the board will be raised until 
it corresponds to his height. In such 
a system only the tallest individual will 
make contact with the board, and so 
he will himself have an immediate sig- 
nal that he is the tallest boy. 

Now suppose that instead of boys, 
we have signals, not varying in height, 
but in some other dimension (which 
we may continue to call “height”) 
which corresponds to their importance 
to the organism. Suppose that each 
signal as it arrives is capable of push- 
ing up some “level” up to its own 
“height” (the height determined by its 
importance), then the most important 
signal arriving at any particular time 
will determine this level, analogous to 
the horizontal board in our example. 
It will then be the case that any signals 
which arrive then or after and are of 
lesser importance and so of smaller 
height will be below this level. How- 
ever, if the signal of greater height 
ceases to be present, then the level will 
sink to the height reflecting the im- 
portance of one of the other signals 
which is arriving. 

If we suppose that only signals 
whose height corresponds to the height 
of the level switch in further processes, 
such as motor output, memory storage, 
and whatever else it may be that leads 
to awareness, we have the outline of 
a system which will display the type 
of behavior we associate with atten- 
tion. Only the most important sig- 
nals coming in will be acted on or 
remembered. On the other hand, 
more important signals than those 
present at an immediately preceding 


time will be able to break in, for these 
will raise the height of the level and 
so displace the previously most im- 
portant signals as the highest. 

So far we have omitted any discus- 
sion of the role of general arousal 
in selective attention. Without such 
arousal, usually (but not invariably, 
Bradley & Elkes, 1953; Gastaut, 1954) 
indicated by characteristic patterns on 
the electroencephalogram, awareness 
of and behavioral responsiveness to 
peripheral stimulation are absent. 
Some degree of general arousal is 
thus necessary for attention to oper- 
ate. Furthermore, individuals when 
aroused will attend to any incoming 
message, provided that it is not con- 
comitant with a more important one, 
whereas when asleep they will only 
respond to very “important” messages, 
such as a person’s own name (Oswald 
et al., 1960) or, in the case of a 
mother, the sound of her infant crying. 
And when drowsy, though responsive 
to a larger range of stimuli than when 
asleep, subjects will tend to “miss” 
signals which they would notice when 
fully awake. 

The system which takes this into 
consideration is schematically repre- 
sented in the diagram below (Figure 
1). Any given message will only be 
heeded if the horizontal line (Y) rep- 
resenting the degree of general arousal 
meets or crosses the vertical line, the 
height of which represents the “im- 
portance” of the message. Whether 
or not alerting will take place then 
depends both on the level of general 
arousal and on the importance of the 
message. Attention will not be paid 
to Message b though it is the most 
important of all incoming signals, 
when the level of general arousal is 
low (Position X). When the level 
of general arousal is at Z, which is 
very high, attention could be paid to 
all the signals a, b, c, d, and e. In 
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xX asleep 


mmportance of 
message 


Y crowsy 


Z alert 


e a 
actual messages 


Fic. 1. Diagram to illustrate operation 
of proposed system. (The interrupted hori- 
zontal line—l—represents the “level” of im- 
portance in the specific alerting system 
which is raised and lowered according to 
the incoming messages. The solid horizon- 
tal lines represent levels of general arousal. 
At X, the organism is asleep and none of 
the actual messages produce alerting. At Y, 
the organism is drowsy and only some in- 
coming messages produce alerting. At Z, 
the organism is awake. All messages could 
be alerted to, but the specific alerting sys- 
tem allows only b to be heeded.) 


fact, attention is paid only to b as a 
result of the operation of the specific 
alerting mechanism. 

Further, it is supposed that a mes- 
sage will increase the level of general 
arousal in proportion to its importance 
and for various lengths of time in 
proportion to its importance, so that 
messages which would not have been 
heeded before will command attention 
if they follow in the wake of a more 
important message. 

The mechanism whereby the weight- 
ing of importance of messages is car- 
ried out is given by Deutsch’s (1953, 
1956, 1960) theory of learning and 
motivation, and will be only briefly 
summarized here, since it is not the 
main point of the paper. It is as- 
sumed that on exposure to a succes- 
sion of stimuli, link-analyzer units 
responsive to these stimuli will be 
connected together. Certain primary 
links, when stimulated by physiologi- 
cal factors, generate excitation, and 
this is passed on from link to link 
along the connections established by 
experience, Each link-analyzer unit 
will receive excitation depending first, 
on the state of the primary links to 


which it is connected, either directly 
or indirectly, and second, on the “re- 
sistance” of such a connection, which 
is determined by past learning. It is 
assumed that the amount of such ex- 
citation arriving at a link-analyzer 
unit determines both its threshold of 
excitability by incoming stimuli (lead- 
ing to an increased readiness to per- 
ceive a stimulus whether it is there 
or not) and the ranking of importance 
of such a stimulus (e.g., Lawrence, 
1949, 1950). We should predict from 
this theory an inverse correlation be- 
tween the attention-getting or distract- 
ing value of a stimulus when attention 
is being paid to another, and its thresh- 
old (regarded as the likelihood of its 
being reported by a subject when he is 
asked to say what he perceives). We 
should also expect that stimuli which 
have a high importance weighting 
should more often be mistakenly per- 
ceived when similar stimuli are pres- 
ent. 


NEUROPHYSIOLOGICAL CORRELATES 


We may ask how the suggested sys- 
tem would fit what is known of the 
physiological substrate of attentive be- 
havior. One of the salient features of 
the system as proposed is that it as- 
sumes that all sensory messages which 
impinge upon the organism are per- 
ceptually analyzed at the highest level. 
It would therefore be of relevance to 
discuss the group of neurophysiologi- 
cal experiments the results of which 
have been claimed to demonstrate a 
neural blockage of “rejected” mes- 
sages at the lower levels of the primary 
sensory pathways. Hernandez-Pedn, 
Scherrer, and Jouvet (1956) showed 
that the evoked response at the dorsal 
cochlear nucleus to clicks was reduced 
by the presentation of “distracting” 
olfactory and visual stimuli. A simi- 
lar effect was found in the visual 
pathways (Hernandez-Peén, Guzman- 
Flores, Alcarez, & Fernandez-Guardi- 
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ola, 1957). Stimulation of the reticu- 
lar formation could produce similar 
results, and it was supposed that such 
stimulation was treated as the presen- 
tation of a distracting stimulus. It 
has also been demonstrated by various 
workers (e.g. Galambos, Sheatz, & 
Vernier, 1956; Hernandez-Peén & 
Scherrer, 1955) that responses to au- 
ditory clicks recorded from the dorsal 
cochlear nucleus (as well as other 
placements) diminish with repetition. 
Habituation to photic stimuli has been 
demonstrated for the retina (Palestini, 
Davidovich, & Hernandez-Peén, 1959) 
and for the olfactory bulb (Hernandez- 
Peón, Alcocer-Cuaron, Lavin, & Santi- 
bafiez, 1957). It was therefore pro- 
posed that during inattention to a 
signal (either by distraction or habitu- 
ation) information concerning this 
signal was blocked at the level of the 
first sensory synapse by means of “af- 
ferent neuronal inhibition.” Recently, 
however, evidence has been produced 
indicating, at least for the visual and 
auditory pathways, that such changes 
in the evoked potential were due to 
peripheral factors, and represented 
simply a decrease in the effective in- 
tensity of the stimulus. Hugelin, Du- 
mont, and Paillas (1960) report that 
when the middle ear muscles were 
cut stimulation of the reticular forma- 
tion would not cause a diminution in 
the amplitude of the evoked responses. 
They report further that such con- 
tractions of the middle ear muscles 
which result from reticular stimulation 
produce a mean diminution .of micro- 
phonic potentials of less than 5 deci- 
bels. The reduction in sensation 
brought about by these means there- 
fore appears unimportant. Naquet, 
Regis, Fischer-Williams, and Ferran- 
dez-Guardiola (1960) found that if 
the size of the pupil were fixed by local 
application of atropin the evoked 
potential recorded from placements 


below the cortex demonstrated a con- 
sistent amplitude. 

The above findings do not, how- 
ever, apply to changes in the cortical 
evoked response during distraction or 
habituation. | Moushegian, Rupert, 
Marsh, and Galambos (1961) found 
that in animals in which the middle 
ear muscles had been cut, cortical 
evoked responses to clicks still demon- 
strated diminution during habituation 
and distraction, and amplification when 
the clicks were associated with puffs 
of air to the face. Naquet, in the 
experiment quoted above, reports that 
application of atropin to the pupil did 
not prevent a variation in cortical 
evoked responses, which diminished 
during desynchronization and were 
enhanced during synchronization of 
electrical rhythms, also changing in 
morphology. 

Reports of changes in cortical 
evoked responses during habituation 
and distraction are many and varied, 
and it would be impossible in this 
space to describe the field in detail. 
Certainly disagreement exists over 
what occurs as well as over its inter- 
pretation. For instance, Horn ( 1960), 
recording flash evoked responses in 
the visual cortex of cats when resting, 
and when watching a mouse, found 
that the responses were reduced in 
amplitude when the cat was watching 
the mouse; when it ignored the mouse, 
responses remained of high amplitude. 
Further, after a series of tone-shock 
combinations, it was noted that the 
evoked response to flash was reduced 
after a series of tones only if there was 
“some visual searching component in 
the cat’s response to the acoustic stim- 
uli.” Horn argues that attenuation 
of evoked responses in the cortex 
might be correlated with greater sen- 
sitivity in the appropriate region, 
rather than signifying a reduction 
in incoming information. However: 
other recent experimenters continue 
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to maintain that evoked responses 
diminish in amplitude when attention 
is not being paid to the test stimulus. 
Garcia-Austt, Bogacz, and Vanzulli, 
(1961) recorded scalp visual evoked 
responses in human subjects (who 
were able to give introspective reports) 
during presentation of flash stimuli. 
They report, 

When the stimulus is significant and there- 
fore attention is paid to it, the response is 
relatively simple and widespread. When, 
on the other hand, the stimulus is not sig- 
nificant and no great attention is paid to it, 


the response is reduced, complex, and 
localized. 


It would seem that changes in the 
evoked potential at the cortex do in- 
deed take place during habituation and 
attention shifts; but that what those 
changes exactly are, and what they 
represent, is not yet clear. 

We should indeed expect, on the 
above theory of attention, changes in 
the cortical evoked potential when at- 
tention is being paid to a stimulus, 
reflecting the activation of various 
processes, such as motor output and 
memory storage. Pertinent to this 
assumption is the discovery by Hubel. 
Henson, Rupert, and Galambos 
(1959) of what they term “attention” 
units in the auditory cortex. By the 
use of microelectrodes implanted in 
unanesthetized and unrestrained cats, 
they. obtained records from units which 
responded. only when the animal was 
“paying attention” to the sound source. 
These attention units appeared to be 
both interspersed amongst the others 
and segregated from them. We may 
venture to interpret these results by 
supposing that the units in question 
formed part of the systems, discussed 
above, responsible for the appropriate 
motor response to stimulation or the 
committing of items to memory, and 
so forth, or that they lay on the path- 
way to these systems. Thus they 
would be inactive even if impulses 


evoked by auditory stimulation were 
reaching the cortex, provided that the 
animal was not also attending to the 
stimuli. 

There is another theoretical assump- 
tion for which we might reasonably 
seek a neurophysiological counterpart. 
We suppose that a selection of inputs 
from a variety of sources takes place 
by comparison with a fluctuating 
standard. This implies the existence 
of an undifferentiated structure with 
widespread connections with the rest 
of the central nervous system. We 
are tempted, on account of the evi- 
dence for the diffuseness of its input, 
to identify the brain stem reticular 
formation as this particular structure. 
Potentials may be evoked throughout 
this structure by excitation of various 
sensory systems (French, Amerongen, 
& Magoun, 1952; Starzl, Taylor, & 
Magoun, 1951), and various cortical 
structures (Bremer & Terzuolo, 1954; 
French, Hernindez-Peén, & Living- 
ston, 1955). Occlusive and facilita- 
tory interaction between responses 
evoked in the reticular formation from 
very different sources have further 
been observed (Bremer & Terzuolo, 
1952, 1954; French et al., 1955). 
Single unit studies demonstrating a 
convergence of input from several 
sources have also been reported 
(Amassian, 1952; Amassian & De 
Vito, 1954; Hernández-Peón & Hag- 
barth, 1955; Scheibel, Scheibel, Mol- 
lica, & Moruzzi, 1955). A similar 
conclusion, that the reticular formation 
is capablé of acting as a nonspecific 
system, can be based on neuroanatomi- 
cal evidence. Scheibel and Scheibel 
(1958) state on the basis of their ex- 


tensive histological study : 


the degree of overlap of the collateral af- 
ferent plexuses is so great that it is difficult 
to see how any specificity of input can be 
maintained, rather it seems to integrate and 
vector a number of inputs. 
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We have also postulated that the 
fluctuating level correlates with states 
of arousal. Again the brain stem re- 
ticular formation seems well suited to 
fulfill this function. Its importance in 
the regulation of states of arousal has 
been demonstrated both through work 
involving lesions (Bremer, 1935; 
French, 1952; French & Magoun, 
1952; Lindsley, Schreiner, Knowles, 
& Magoun, 1949) and stimulation of 
this structure (Moruzzi & Magoun, 
1949; Segundo, Arana, & French, 
1955). Recently Moruzzi (1960) 
has shown that the lower brain stem 
may play an important role in the 
initiation of sleep. It also seems 
likely that the thalamic reticular sys- 
tem is involved in the regulation of 
states of arousal. Large bilateral le- 
sions of the anterior portion of this 
system may produce coma analogous 
to that produced by lesions of the mid- 
brain (French et al., 1952) although 
the depth of coma so produced is less 
profound. Stimulation of portions of 
this system has also been shown to 
produce either sleep or arousal de- 
pending on the parameters of stimu- 
lation (Akimoto, Yamaguchi, Okabe, 
Nakagawa, Nakamura, Abe, Torii, & 
Masahashi, 1956; Hess, 1954). 

The work of Adametz (1959), Chow 
and Randell (1960), and Doty, Beck, 
and Kooi (1959), who demonstrated 
that with different operational tech- 
niques and with assiduous nursing 
care massive lesions of the mid-brain- 
reticular formation need not produce 
coma, should, however, be considered. 
Chow, Dement, and Mitchell (1959) 
found also that massive lesions in the 
thalamic reticular system need not 
produce coma. Until reasons for these 
discrepant results are found we must 
regard our conclusions as to the role 
of the reticular system in attention as 
tentative. 

Whatever the explanation of the 
findings on lesions in the reticular 


formation may turn out to be, it seems 
that, if we are right, some diffuse and 
nonspecific system is necessary as a 
part of the mechanism subserving se- 
lective attention. Such a system should 
be found to have afferent connections 
from all discriminatory and perceptual 
systems. Through these connections 
it should be influenced to take up a 
variety of levels; the level at any one 
time corresponding with the level of 
the “highest” afferent message from 
the discriminatory mechanisms. On 
its efferent side such a nonspecific 
system should again be connected 
with all discriminatory and perceptual 
mechanisms. Through such connec- 
tions it would signal to them its own 
level. If this level of the nonspecific 
system was above that of a particular 
discriminatory mechanism, no regis- 
tration in memory or motor adjust- 
ment would take place, if such a 
discriminatory mechanism was stimu- 
lated. Consequently, only that dis- 
criminatory mechanism being activated 
whose level was equal to that of the 
diffuse system would not be affected. 
In this way the most important mes- 
sage to the organism will have been 
selected. 
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OF SIGNAL DETECTION! 
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A 2-process model for signal detection is proposed that is applicable 
to both yes-no and forced-choice experiments. One process describes 
systematic changes that may occur in the S's sensitivity level to external 
stimuli; the other process defines a learning mechanism that determines 
trial-to-trial changes in the S's decision role as information accrues to 
him. From the theory one can derive predictions for gross statistics 
like receiver-operating-characteristic curves and also for detailed se- 
quential effects such as autocorrelation functions defined over stimulus- 
response runs. Predictions for sequential effects are particularly im- 
portant in evaluating the theory and provide a valuable insight into the 
character of the detection process. Application of the theory to various 
special cases is considered; some predictions are derived and checked 


against experimental data. 


This paper deals with an analysis 
of some simple detection experiments 
in terms of a theory that incorporates 
two separate but interdependent proc- 
esses: an activation process and a 
decision process. The activation proc- 
ess specifies the relation between 
external stimulus events and hy- 
pothesized sensory states of the sub- 
ject. The decision process specifies the 
subject's observable response in terms 
of his sensory state and information 
acquired during the course of an 
experiment. Both processes are dy- 
namic. The activation process defines 
the subject's level of sensitivity to 
external stimuli, and we postulate that 
sensitivity may fluctuate (within cer- 
tain limits) from trial to trial as a 
function of past events. The decision 
process is similarly dynamic, for it 
may change from trial to trial as 
information accrues to the subject. 
The processes interact in that the 
momentary state of one process 


1 The ideas presented in this paper have 
been much influenced by discussions with R. 
A. Kinchla of Ames Research Center. The 
research was supported by the National Insti- 
tute of Health under Contract M-5184. 
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operates in a reciprocal fashion to 
determine the state of the other. As 
will be indicated later, most theories 
of signal detection view the subject's 
sensitivity level as fixed (or at most 
fluctuating in a strictly random fash- 
ion over time) and account for 
variations in his performance to a 
fixed intensity signal by postulating 
changes in the decision rule. In con- 
trast, for the present theory changes 
in performance to a fixed intensity 
signal may arise in several ways: 
manipulating aspects of the experi- 
mental situation that affect the 
subject's sensitivity level but leave 
the decision process unchanged, ma- 
nipulating variables that affect the 
decision process but leave the sensi- 
tivity level unchanged, or manipulat- 
ing parameters that affect changes in 
both processes. 

The theory that we present gen- 
erates predictions for all aspects of the 
subject’s response protocol (mean 
response probabilities, associated vari- 
ances, sequential statistics such as 
autocorrelation functions on both 
responses and stimuli, and so forth) 
and thereby permits a detailed treat- 
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ment of individual trial-by-trial data. 
Some predictions are parameter free, 
but by and large the predictions 
depend on estimates of parameters 
that describe the stimulus situation 
and the hypothesized detection proc- 
ess. Some readers may feel that we 
have been too liberal in postulating 
parameters; however, for most ap- 
plications, restrictions are appropriate 
that markedly reduce the number of 
parameters that need to be estimated. 
For example, predictions regarding re- 
ceiver operating characteristic curves 
and certain first order sequential 
phenomena may require that only 
two parameters be estimated. In con- 
trast, autocorrelation predictions in 
complex detection experiments may 
require that as many as six parameters 
be estimated. 

The type of psychophysical study 
to be considered is a choice experiment 
for which the experimenter has es- 
tablished, and explained to the subject 
a one-to-one correspondence between 
the response set (A1, A2,--*, Ar) and 
the stimulus presentation set (Si, 
So, +++, Sh). On each trial a stimulus 
is presented and the subject attempts 
to identify the stimulus by making the 
appropriate response. For excellent 
reviews of research and theory in this 
area see Green (1960), Licklider 
(1959), or Swets (1961). 

For purposes of this paper we shall 
consider only experiments for which 
ry = 2. That is, on each trial either 
Sı or Sz is presented and the subject 
is required to make either response 
A, or A». Also, the theoretical 
development will be restricted to 
procedures where the experimenter 
informs the subject at the end of each 
trial which response was correct. 
These two restrictions are not funda- 
mental to the theory, but greatly 
simplify the presentation. Later it 
will be apparent that the model can 
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be extended to multistimulus prob- 
lems and to procedures in which in- 
formation feedback is manipulated as 
an experimental variable. 

Two types of experimental pro- 
cedures are to be distinguished in the 
analysis. We define these in terms of 
the following examples: 

Yes-No procedure. The Si is a tone 
burst in a background of white noise 
and Sx is the white noise alone. Ona 
given trial either Sı or Sz is presented 
and the subject answers yes (A 1) or no 
(As) regarding the presence of the 
signal. 

Forced-choice procedure. Two tem- 
poral intervals are defined on each 
trial, exactly one of which contains a 
signal: i.e. in one interval a tone 
burst in a background of white noise 
is presented, while in the other 
interval only the white noise is pre- 
sented. On each trial, the subject 
is required to identify the interval he 
believes most likely to have contained 
the signal. Thus, S:(i = 1, 2) denotes 
a trial on which the signal occurred in 
Time Interval i and A;(j = 1, 2) de- 
notes the subject’s selection of Interval 
j as the one containing the signal. 


In this paper we shall use the — 


identifications given in theseexamples. 
That is, for the yes-no procedure Si 
will always denote signal plus noise, 
whereas Sz will denote noise alone; 
for the forced-choice procedure S, will 
denote signal plus noise in the first 
interval followed by noise alone in the 
second interval, and S indicates noise 
alone in the first interval and signal 
plus noise in the second interval. In 
addition, the following notation wi 
be used: 


Sin = The presentation of Stim- 
ulus S; on Trial n of the 
experiment. 

Aj, = The occurrence of Response 
A; on Trial n of the expert! 
ment. 
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E;., = The occurrence of an in- 
formation event at the end 
of Trial n that indicates that 
Stimulus S; was presented. 


A theoretical result of particular 
interest in analyzing detection data 
deals with the relation of Pr(A1,n|Si.») 
to Pr(A1,n|S2,n). For simplicity we 


write 
Pin = Pr(Ai,n|Si,n) 
Pan = Pr(AyalSaa) Et 
and when the appropriate limit exists 
lim pin = Pi 


For the yes-no procedure pı is the 
asymptotic probability of a yes report 
when the signal is presented (the like- 
lihood of a “hit”) and pə is the 
probability of a yes report when noise 
alone is presented (the likelihood of a 
“false alarm”). In the literature, 
plots of the relation of pə to pi are 
commonly called ROC curves, which 
stands for receiver operating character- 
istic curves. It is important to note 
that we use the term ROC curve in 
reference to both the yes-no and 
forced-choice method. When one 
deals with n interval forced-choice 
problems, then the ROC curve is a 
surface in n space and predictable from 
the theory. 

This paper treats the effects of three 
classes of variables: the physical pa- 
rameters of the stimulus presentation 
set; the trial-to-trial schedule for 
presenting stimuli; and, the class of 
variables such as monetary payoffs 
and instructions that are viewed as 
influencing the motivation and set of 
the subject. To simplify the discus- 
sion, we shall consider only a simple 
probabilistic scheme for presenting 
stimuli; namely, 


Pr(Sin) =Y [2] 


where y is constant over trials. More 
complex stimulus schedules can be 
analyzed; e.g., the stimulus presenta- 
tion on Trial » might depend on the 
response on Trial n — k, or on the 
stimulus on Trial » — k’, or both. 
However, an analysis of this simpler 
schedule will be sufficient to illustrate 
the basic concepts and encompasses 
most of the experimental literature. 


Axioms and Rules of Identification 


The hypothesized sensory state of 
the subject that results from the 
presentation of an external stimulus is 
specified in terms of two sensory 
patterns sı and sz and a set S* of 
stimulus patterns associated with 
background stimulation. These stim- 
ulus patterns are theoretical con- 
structs to which we will assign certain 
properties. They are not the receptor 
neurons of neurophysiology but a 
schematic representation of the phys- 
ical stimulus, having certain simple 
and uniform properties. 

On every trial a single pattern is 
sampled from the background set S* 
and simultaneously one of the sensory 
patterns may or may not be activated. 
If the sı sensory pattern is activated, 
an A, response will occur; if sp is 
activated, an A+ will occur. If neither 
sensory pattern is activated, the 
subject makes the response to which 
the background pattern is condi- 
tioned. Conditioning of elements in 
S* may change from trial to trial via a 
simple learning process. 

The likelihood of activating Sensory 
Pattern s; given Stimulus Event S; on 
Trial n (and thereby insuring a correct 
response) is denoted as mi,n. The 
parameter Mi,» is a measure of the 
subject’s momentary sensitivity level 
and may fluctuate from trial to trial. 
However, the momentary sensitivity 
level is bounded between zero and Mi, 
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and the parameter M; represents the 
subject's maximum level of sensitivity 
to a fixed signal. The parameters 
M, and Mz are to be interpreted as 
measures of the physical character- 
istics of Sı and Sz and are monotonic 
with signal strength. Further, we 
assume that variables such as stimulus 
presentation schedule, instructions, 
monetary payoffs, and experimental 
design have no effect on Mı and Ma. 

Changes in sensitivity level occur 
from trial to trial and depend on 
previous events. Specifically, if the 
subject tends to do well (i.e., emit 
correct responses) by ignoring the 
sensory patterns when they are acti- 
vated and responding in terms of the 
background stimuli alone, then he will 
tend to lower his level of sensitivity. 
If, however, he tends to do poorly by 
basing his response solely on the back- 
ground cues, then he will tend to raise 
the value of m;,,. Roughly speaking, 
we assume that there is a certain cost 
associated with maintaining a high 
level of sensitivity and view the sub- 
ject as being predisposed to reduce his 
sensitivity level whenever possible. 
However, the subject’s tendency to 
lower his sensitivity level is counter- 
acted if the reduction gives rise to a 
significant decrement in his ability to 
perform effectively. Thus the activa- 
tion process can be described as a 
negative feedback system in which the 
cost associated with maintaining a 
high level of sensitivity interacts with 
the cost associated with a decrement 
in performance so as to determine a 
momentary level of sensitivity. The 
parameters that specify the incre- 
ments and decrements in sensitivity 
are u and ô, and we assume that their 
values may change if the subject’s 
motivation or set changes. We return 
to this point later. The concept of a 
variable level of sensitivity is not new 
and there is considerable experimental 


evidence at both the behavioral and 
physiological level to support the idea 
(e.g., Blackwell, 1953; Guilford, 1927; 
Howarth & Bulmer, 1956; Oldfield, 
1955; Verplank, Collier, & Cotton, 
1952; Wertheimer, 1953). In addition 
notions of this sort have played a role 
in the speculations of Gestalt psy- 
chologists (e.g., Kohler, 1947) and 
more recently, in theoretical develop- 
ments regarding the interplay between 
the reticular system and the associa- 
tion cortex (Lindsley, 1958). The 
important feature of the present 
theory is the relation postulated 
between variations in the sensi- 
tivity level and past stimulus-response 
events. 

The axioms will be formulated ver- 
bally; it is not difficult to state them 
in a mathematically exact form, but 
for our purposes this will not be 
necessary. The axioms fall into three 
groups: the first group deals with the 
activation process; the second, with 
the decision process; and the last 
group with variations in sensitivity. 

Activation axioms. A1. If S; occurs 
on Trial n, then Sensory Pattern s; will 
be activated with probability mi,n. 

A2. Exactly one pattern is sampled 
from set S* on every trial. Given the 
set S* of N patterns, the probability 
of sampling a particular element is 
1/N, independent of trial number and 
preceding events. 

Response axioms. R1. If Sensory 
Pattern s; is activated, then the Ai 
response will occur. If neither sensory 
pattern is activated, then the response 
to which the sampled pattern from S* 
is conditioned will occur. 

R2. On every trial each pattern in 
S* is conditioned to either A, or 42 
If a pattern from S* is sampled on @ 
trial, it becomes conditioned with 
probability 0; to the A; response if £: 
occurs on that trial; if it is already 
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conditioned to that response, it re- 
mains so. 

Sensitivity level axioms. Li. The 
parameter M; specifies the maximum 
value of m;,,. Further 


Min = WM; 


L2. The weighting function Wn 
changes from trial to trial as follows: 


Waar = Anr DCA — 5) wn] 
+ [1 — AJA — pwn + u] 


The function A„® denotes the pro- 
portion of trials from Trialn — + 1 
to Trial n on which the information 
event E; agreed with the response 
conditioned to the pattern sampled 
from S*. 

The distinction between yes-no and 
forced-choice methods is specified in 
terms of the parameters Mı and M». 
To explicate the distinction between 
these two experimental procedures we 
redefine M, and Mz in terms of the 
more molecular parameters ¢ and 7. 
Consider a limiting condition in which 
the subject is performing at his highest 
level of sensitivity (i.e, w = 1). 
Under these conditions, if a signal is 
presented in noise we assume that the 
subject either detects the signal (with 
probability ø) or is uncertain whether 
the signal occurred. Similarly, when 
noise alone is presented we assume 
that the subject either detects the 
absence of a signal (with probability 
n) or is uncertain whether or not the 
signal occurred. The three events 
will be denoted as follows: s = de- 
tected signal; § = detected omission 
of signal; and u = uncertain. For the 
yes-no method the occurrence of s is 
identified with the activation of 
Sensory Pattern sı and therefore a 
“yes” response; § with the activation 
of sp and the occurrence of a “no” 
response; and the event u with the 
activation of neither sı nor sz and 


consequently the occurrence of the 
response conditioned to the element 
sampled from S*. Hence for the 
yes-no procedure 


M, A 
M= [3] 
For the forced-choice procedure the 
analysis is similar. Consider an Sı 
trial—signal plus noise in the first 
interval followed by noise alone in the 
second interval. One of the following 
event sequences can occur: 


1. Event s occurs in the first inter- 
val and is followed by Event § 
in the second interval—with 
probability on 

2. s followed by u—with prob- 
ability o(1 — n) 

3. u followed by š—with prob- 
ability (1 — e)n 

4. u followed by u—with prob- 
ability (1 — o) (1 — n). 


Information transmitted by either 
Outcome 1, 2, or 3 suffices to identify 
the trial, and therefore the occurrence 
of any one of these outcomes is asso- 
ciated with the activation of Sensory 
Pattern sı and the occurrence of the 
A, response. If the fourth outcome 
occurs, we assume that neither sen- 
sory pattern is sampled.’ Therefore, 
Mı = on + o(1 — n) + (1 — c)n and 
by a similar argument it can be shown 
that Mı = Ms. Hence for the forced- 


2 Jn formulating a model that also treated 
choice time and confidence ratings it would 
be natural to distinguish among Outcomes 
1 to 3. However, for an analysis of response 
selection, such a distinction is not necessary. 
Also, note that the assignment of probabilities 
to the four outcomes assumes no time-order 
effect; i.e., no interaction between events in 
one temporal interval and the next. For a 
given experimental situation, the precision 
of the comparison between the forced-choice 
and the yes-no method will depend on the 
accuracy of this assumption. 
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choice method 


M, = M: 


[4] 
In theory, once ø and 7 have been 
estimated, say, for the yes-no method, 
they can be used to predict in the 
forced-choice procedure. In this re- 
gard note that (for fixed values of o 
and n) the parameter Mı = Mz for the 
forced-choice method is always greater 
than or equal to M, and Mz for the 
yes-no method. 

In the present formalization of the 
theory only Events s and u can occur 
given signal plus noise and only Events 
š and uw, given noise alone. When the 
model was first developed, we per- 
mitted s, §, and u to occur (with 
different probability distributions) 
given either signal plus noise or noise 
alone. However, in the analysis of 
several sets of data (Atkinson & Cart- 
erette, in preparation®; Carterette & 
Wyman, 1962; Kinchla, 1962) esti- 
mates of the probability of Event s 
given noise and the probability of 5 
given signal plus noise were consis- 
tently equal to zero. Hence for the 
present discussion we have chosen to 
let Pr(s|noise alone) = Pr(8|signal 
plus noise) = 0 and thereby simplify 
the presentation. It also is interesting 
that in the analysis of the above data 
the estimate of » was very close to 
zero. In fact, by setting 7 = 0 the 
correspondence between theoretical 
and observed values was not much 
different than when a separate esti- 
mate of the parameter was made. 
However, even for small values of n 
the § event plays an important role in 
accounting for second choice data in 
multiinterval forced-choice experi- 
ments and for this reason the sim- 


etnon 


3 Atkinson, R. C., & Carterette, E. C. 
Signal detection as a function of the stimulus 
presentation schedule: A comparison of 
forced-choice and yes-no procedures (in 
preparation). 
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plifying assumption of n = 0 was not 
made. 


Asymptotic Response Probabilities and 
ROC Curves 


If we let Yn denote the proportion 
of elements in S* conditioned to an 
A, response at the start of Trial n, then 
(by Axioms A2 and R2) we may write 
the following difference equation : 


0 6 0 
Yny = Yn [1-0-7 a =$] Ty 


This recursion can be solved by stand- 
ard methods (see Atkinson & Estes, 
1963) to yield the explicit formula 


ve =v-W-v) 
1 n—1 
x[i -ylh 7y) tom | 


where 


aesti 75 
Aarra 1s) 


5 2 
and the response bias parameter 8 = a 
1 


The quantity y denotes the lim pn and 


ne 
is the asymptotic probability of an A1 
response given that an element from 
S* determines the subject’s response. 
For most analyses we shall be con- 
cerned with response protocols that 
may be viewed as asymptotic data. 
Hence, in general, theoretical results 
are presented only for the case in 
which 1 is large. 

Using techniques similar to those 
employed in Equation 5 and applying 
Axiom T2 yields an expression for 
lim w, = w; namely, 

1-—A 


iied [6] 


w 


ô 
where the activation parameter a = fi 


A VARIABLE SENSITIVITY THEORY or SIGNAL DETECTION 97 


and A = yy + (1 — y) (1 — 4). In 
the statement of Axiom T2 we assume 
that the amount w, increases or 
decreases on a trial depends on A, ®; 
the value of this function being the 
proportion of times on the last ¢ trials 
on which the subject would have been 
correct by ignoring the sensory pat- 
tern and responding solely in terms 
of the background cue. Itis interest- 
ing that the asymptotic expression 
for w, in Equation 6 is not a function 
of £; i.e., independent of the number 
of trials the subject scans over, the 
value of w depends only on a, 8, and 
y. To be more exact, at asymptote 
the random variable associated with 
the weighting function has an ex- 
pectation of w independent of ¢; 
however, the variance of the dis- 
tribution does depend on ¢, being 
maximum when £ = 1 and approach- 
ing zero as ¢ becomes large. Analyses 
of data reported by Carterette and 
Wyman (1963), and Atkinson and 
Carterette® yielded estimates of ¢ that 
were quite large. In view of these 
empirical results and for reasons of 
mathematical simplicity we will, in 
general, assume that €— œ. Later 
the effect of £ on sequential predic- 
tions will be discussed but, otherwise, 
the mathematical results presented in 
this paper will be for the case where 
the scan range is large. 

Employing our previous results, 
and using Axioms A1 and A2 we 
obtain: 


1=m +(1—m)y [7a] 


p: = (1 — me) [7b] 
where m; = lim m;,n, and 
mi = wMi [8] 


An inspection of Equations 7 and 8 
indicates that pı and pz are functions 
of Mı, Ms, a, 8, and y. Of course y is 


specified by the experimenter and 
therefore, to fit any ROC curve, four 
parameters need to be estimated. 
However, for most applications re- 
strictions are appropriate that reduce 
this number. For example, in a 
forced-choice experiment the sym- 
metry between S, and Sẹ stimuli is 
such to require that ð, = 42 (unless 
the subject has a bias extraneous to 
the experiment that favors one re- 
sponse over the other) and hence 
B = 1. Further, by an earlier argu- 
ment (see Equation 4) we require that 
M, = Mz. Therefore, in a forced- 
choice procedure the ROC curve 
depends only on M and a. 

ROC curves. We now examine two 
methods for experimentally generat- 
ing ROC curves. One procedure is to 
vary the schedule for presenting Sı 
and Sz; for purposes of the present 
paper this involves varying y from 
session to session while holding all 
other factors constant (Tanner, Swets, 
& Green, 1956). Another method for 
generating ROC curves is to manipu- 
late instructional variables and/or 
payoffs from one experimental ses- 
sion to another while using the same 
stimuli and holding y fixed (Swets, 
Tanner, & Birdsall, 1955). The pre- 
dictions for each of these cases will be 
examined separately. 

Consider first the case in which y is 
permitted to vary while all other 
factors remain unchanged. Under 
these conditions it is assumed that the 
instructions and payoffs specify fixed 
values of the response bias parameter 
8 and the activation parameter a. 
Also M, and M: are not affected by 
the value of y for, in theory, they 
depend only on the physical character- 
istics of the stimulus presentation set. 
Therefore, for a given experimental 
situation Mı, Mə, a, and £ are fixed, 
and variations in pı and pz induced by 
manipulating the schedule for pre- 
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Pr(A,!S.) 


4 6 


A 8 
Pr(A,IS2) 
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senting Sı and Sz must be accounted 
for strictly by variations in y. 

If we hold Mı, M», a, and 8 constant 
and vary y between 0 and 1 (the 
permissible range), then the ROC 
curve defined by Equation 7 is in 
general, a monotone increasing func- 
tion that originates at point (0, 0) and 
terminates at point (1, 1). However, 
it is necessary to be more precise and 
distinguish three cases: 

1. If6 = Oand u > 0, then asymp- 
totically the subject performs at his 
maximum level of sensitivity inde- 
pendent of other factors, and the ROC 
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FORCED- CHOICE METHOD 
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ROC curves generated by manipulating the presentation schedule of stimulus events. 


curve is given by the linear function 


-M 
a? +M [9] 


‘= 7 

2. If6 > Oand p = 0, then asymp- 
totically the subject performs at his 
minimum level of sensitivity, and the 
ROC curve is simply 


Pi = pz [10] 


3. For the general case where u and 
ô are both greater than zero, the ROC 
curve is a nonlinear monotone in- 
creasing function bounded between 


i 
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Equation 9 and Equation 10 that 
originates at (0, 0) and terminates at 
(1, 1). 

Figure 1 gives several ROC curves 
for both yes-no and forced-choice 
procedures when $ = 1, ¢ = .7, and 
n = .5or.1. The parameter on each 
set of curves is the value of a. Suc- 
cessive points on an individual curve 
were swept out by letting y vary from 
0 to 1. For the general case, a is a 
ratio of two nonzero probabilities and 
hence takes any value greater than 
zero. For a close to zero (low sensi- 
tivity level) the ROC curve tends 
toward the line pı = p2; as a becomes 
large the curve approaches the line 
given by Equation 9. Further, as 
indicated in Figure 1, when a and 8 
are the same in both the yes-no and 
the forced-choice procedure, then (by 
the conditions of Equations 3 and 4) 
the theory predicts that the ROC 
curve generated by the forced-choice 
group will be above the ROC curve 
for the yes-no group. 

It also can be shown that the ROC 
curve defined by varying y is either 
symmetric about the main diagonal 
from point (0, 1) to (1, 0), skewed 
right, or skewed left. For symmetry 
we require Mı = Mz and 6=1; 
otherwise the curve may be skewed 
right or left. Note that the condi- 
tions that specify a symmetric ROC 
curve hold in the forced-choice experi- 
ment; they may or may not hold for 
different yes-no experiments. 

Another method for generating 
ROC curves is to fix both y and the 
signal intensity, and manipulate in- 
structions and/or payoffs from one 
experimental session to another. 
Under these conditions Mi and Ms 
would be constant over sessions but 
we might assume that the response 
parameter and the activation pa- 
rameter vary. Thus the ROC curve 
produced by changing instructions or 


payoffs would theoretically be ex- 
plained by variations in a and/or 8 
given fixed values of Mı, Ms, and y. 
In the discussion of this method we 
let y = 1/2; this condition simplifies 
the mathematics and includes most of 
the experimental work. We examine 
first the cases in which only a or 8 
is permitted to vary and then the case 
in which they vary concomitantly, 

If we hold the bias parameter 8 
constant and let a vary from 0 to œ 
then the ROC curve is a straight line 
segment between the point 


1- M, 1 — M: 
Be ERT ORARE ee 
and the point 

1 1 


ETET 


That is, as the activation parameter 
varies (and all other parameters are 
fixed) we move along the function 


Mı 1 
pı min pa het IFB 


xfi +772 | [11] 


Such a prediction readily can be 
realized experimentally. For the 
forced-choice method £ is fixed and we 
could manipulate a by varying the 
amount of payoff for a correct re- 
sponse from one experimental session 
to another. Then, the ROC curve 
generated over experimental sessions 
would be specified by Equation 11. 
Such an experiment has been con- 
ducted by Blackwell (1953) and this 
is precisely the type of effect observed. 

To be sure, the ROC function given 
by Equation 11 is rather different 
from the typical curve that one thinks 
of with regard to signal detection. 
However, there is no doubt that such 
functions can be generated experi- 
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mentally by symmetrically manip- 
ulating motivation variables in the 
forced-choice problem. In this re- 
gard, it should be noted that the ROC 
curve has been referred to in the 
literature as an equisensitivity curve 
(Luce, 1961). For theories of signal 
detection that have static concepts of 
the activation process, such a term is 
appropriate because all points on the 
function represent equally sensitive 
activation levels. However, from our 
viewpoint the term equisensitive 
does not connote the correct meaning, 
for we admit the possibility of gen- 
erating an ROC curve via variations 
in sensitivity. Specifically, in terms 
of the present theory, ROC curves 
may arise in the following ways: 
experimentally manipulating parame- 
ters that affect the activation process 
but leave the decision process un- 
changed (e.g., Equation 11); manip- 
ulating parameters that affect the 
decision process but leave the activa- 
tion process unchanged (e.g., Equa- 
tion 12); or manipulating parameters 
that affect changes in both the 
activation and decision processes (e.g., 
the case in which y varies while all 
other parameters are fixed). 

If we hold @ fixed and let 8 vary 
(for M; fixed and y = 1/2), then the 
ROC curve is given by the function 


Ste = My M: 
A E E, 


[12] 


We know of no experimental results 
that relate to this prediction. 

Finally, in a yes-no experiment it 
seems reasonable to assume that both 
a and 8 may vary simultaneously as 
instructions and/or payoff change. 
To illustrate the type of effect that 
can be obtained consider the case in 
which a = f(8) such that the function 
f is strictly monotone increasing and 
f(0) = 0. Under these conditions if 6 
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varies between 0and ©, then a convex 
ROC curve is traced out from point 
(pı = 1, j: = 1- Mo) to point 
(pı = 0, ps = 0) that is bounded be- 
tween Equations 9 and 10. The 
degree of convexity and the symmetry 
of the ROC curve will depend on the 
function f. In this regard, it is 
interesting to view the estimate of f 
for a given set of data as a device for 
scaling the effects of instructions and 
payoffs. 

In terms of the above discussion, 
it should be obvious that virtually any 
ROC curve can be fitted by selecting 
appropriate parameter values. Thus, 
within the framework of the present 
theory, the ability of the model to fit 
ROC data is a rather trivial test. It is 
for this reason that we now turn to 
more detailed predictions regarding 
the fine structure of signal detection 
data. 


Sequential Predictions 


It has long been recognized that 
rather complex trial-to-trial depend- 
encies are involved in most psycho- 
physical data. Some particularly 
striking effects have been reported by 
Carterette and Wyman (1963), Ho- 
warth and Bulmer (1956), and Ver- 
plank, Collier, and Cotton (1952); 
these experimenters have demon- 
strated that detection rates (even for 
sophisticated subjects) may increase 
or decrease depending on the im- 
mediately prior sequence of stimulus- 
response events. In this section we 
present some sequential predictions 
for signal detection studies, having 
selected those quantities that are 
particularly useful in making esti- 
mates of parameters. The reader is 
referred to Suppes and Atkinson 
(1960, Ch. 2) for a discussion of ap- 
propriate estimation procedures. 

We shall examine predictions Te 
garding the influence of stimulus an 
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response events on Trial » as They 
affect the response on Trial n + 1. 
Specifically 


Pr(A4,n41|Si,n¢14j,05k, 0) 


where t, j, k = 1,2. Explicit expres- 
sions for these quantities can be 
derived from the axioms. The actual 
derivations are quite lengthy and will 
not be presented here; the reader 
interested in the mathematical tech- 
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niques involved should consult Atkin- 
son and Estes (1963). Also, for 
purposes of this paper, the analysis of 
sequential effects will be confined to 
asymptotic statistics. To simplify 
notation the quantity 
lim Pr(A latl | Si npiA j, aSk, a) 

will be written as Pr(Aı|SiA;Sx). 
The expressions for these probabilities 
are as follows: 


(N — Dry + (1 — y)m [0 + (1 — 6:)m1) 


Pr(Ai| 5:41) = > ms [13a] 
Pr(4il S145) = SDP 4 Hadai =m t (1 = emi) 1136] 
Pr(Ai|S:418,) = — ae me molem, pols ted [13c] 
E e & z Pa prde Wa LIMES NEN 
cae -Dea — m:) + Sow E m e 
PAA Pey us oe met = [138] 
Pr(Ay|S:A yy) = Pe VE my a [132] 
Pr(A,|S2425) = WW Da, pml E (13h ] 


To obtain Pr(A2|S:A;S,) one need 
only note that 


Pr(Ay|S:A Sx) + Pr(A2|SiAjSe) = 1. 


The expressions in Equation 13 are 
rather formidable looking, but nu- 
merical predictions can be easily 
calculated once values for the pa- 
rameters have been obtained. Fur- 
thermore, independently of the 
parameter values, certain relations 
among the sequential probabilities can 


be specified. For example, it can be 
easily shown that 


Pr(Ai| SiA1Si) > Pr(A1|SiA2S;) 
or that 

Pr(A1|S:4:S1) > Pr(A1|S:AiS2) 
for i = 1, 2 and for any values of 
Yy Mh, and M2. 

To indicate the nature of these 


predictions we shall examine some 
data from two subjects run in a 
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forced-choice auditory experiment. 
Two temporal intervals were defined 
on each trial by the onset and offset of 
two lights. A band-limited Gaussian 
noise (the masking stimulus) was 
present continuously throughout the 
experimental session and on every 
trial one of the two temporal intervals 
contained a fixed intensity, 1,000 cps 
tone. The subject pressed one button 
if he believed the signal was in the 
first interval or pressed a second 
button if he believed the signal was in 
the second interval. The experi- 
mental procedure is described in 
detail in Atkinson and Carterette’; 
that paper deals with an analysis of 
forced-choice and yes-no data from 
12 subjects, each run for 350 trials 
per day for 30 days. 

The data we present here is not to 
be regarded as a test of the theory, 
but only to illustrate some of the 
predictions. Table 1 presents the 
observed values for pı, pə and 
Pr(A;|SiAjS,). The value of y was 
set at 1/2 in the experiment and, since 
a forced-choice method was used, we 
assume that 8 = 1 (i.e., 01 = 82 = 9). 
Given that 8 = 1 and y = 1/2 we 
have, via Equation 5, that y = 
Knowing y and the observed value 
of pi, Equation 7a may be used 
to obtain an estimate of mı; namely, 
mı + (1 — m1)/2 = .73 or mı = 46. 
Further, for the forced-choice pro- 
cedure Mı = M and therefore, by 
Equation 8, it follows that mi = mz 
=m. Using the above estimate of m 
we predict by Equation 7b that 
pz = .27 which is quite close to the 
observed value of .28. 

In order to compute predictions for 
the sequential statistics in Table 1 
values for 0 and N are required in 
addition to the estimate of m. Several 
methods may be used to estimate 0 
and N but, for simplicity, we apply a 
least squares technique. Specifically, 


RICHARD C. ATKINSON 


TABLE 1 
PREDICTED AND OBSERVED RESPONSE 
PROBABILITIES AT ASYMPTOTE 


| Observed | Predicted 
Pr(Ax|Si) At dA, 
Pr(Ay|Ss) 28 27 
Pr(Ay|Si:A151) -80 78 
Pr(Ay|S:A2S1) 76 aS. 
Pr(A,|S1A1S2) A3 71 
Pr(A,|S,A2S2) 67 68 
Pr(A,|S2A1S1) .30 32 
Pr(A,|S2A2S1) 532 .29 
Pr(A1|S241S2) .26 i25 
Pr(A1| S242552) 22 22 


for m = .46, the following function 
is defined : 


S(0, N) = > { Pr(A1| SiA jSx) 


— Pr(Ay| S:AjSx)}? 


where Pr(-) denotes the observed 
values given in Table 1. Applying 
the method of least squares, estimates 
of @ and N are obtained by selecting 
values for these parameters that 
minimize the function S(@, N). 

Using appropriate numerical tech- 
niques, the following estimates were 
obtained: 0 = .62, N = 3.83. The 
predictions corresponding to these 
parameter values are presented in 
Table 1. When one considers that 
only three of the possible eight degrees 
of freedom represented in the table 
have been utilized in estimating pa- 
rameters, the correspondence between 
theoretical and observed quantities is 
quite good. The fact that our 
estimation procedure yields a non- 
integral value of N may suggest that 
N varies somewhat from time to time, 
or it may reflect some contamination 
of the data by sources of experimental 
error not represented in the model. 
The reader interested in other applica- 
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tions of this model to sequential data 
should see Atkinson (1963). 


DISCUSSION 


In some respects the theory pro- 
posed in this paper is similar to vari- 
ous applications of statistical decision 
theory to psychophysical phenomena 
(Swets, Tanner, & Birdsall, 1961; 
Tanner & Swets, 1954). The decision 
theory approach rejects the conven- 
tional notion of a threshold and argues 
for the concept of a criterion range of 
acceptance. They assume that on 
each trial the reaction of the sensory 
system to an external stimulus can be 
characterized by a number (a likeli- 
hood ratio) and the subject’s response 
depends on whether or not the number 
falls in the criterion range. The 
process is not deterministic, for re- 
peated presentations of a stimulus 
do riot generate the same number but 
rather a distribution of numbers (i.e., 
to a single presentation of the stimulus 
a number is randomly drawn from the 
distribution). The position of the 
criterion (the operating level) is 
assumed to be under the control of 
the observer and to vary as a function 
of psychological variables that in- 
fluence motivation and set. Specif- 
ically, the subject fixes the operating 
level in terms of a priori probabilities 
of stimuli and the costs associated 
with the various choices in such a way 
as to maximize his expected utility. 
Translated into the language used in 
this paper, the activation process 1s 
represented by the random sampling 
of a number from a distribution 
associated with the stimulus; and the 
decision process refers to the selection, 
by the subject, of an operating level or 
criterion. 

A principal distinction between our 
approach and signal detection theory 
is with regard to the activation 
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process. In our theory the sensitivity 
level of the activation process may 
vary (within a given range) from trial 
to trial as a function of the preceding 
events. In contrast, signal detection 
theory conceptualizes the activation 
process as static, for the parameters 
that describe the response of the 
sensory system to an external stimulus 
are constant and do not depend on 
instructions, stimulus schedules, pay- 
offs, or other variables that might 
influence set or motivation. 

Another distinction between our 
approach and signal detectability 
theory is with regard to the decision 
process. Both theories permit varia- 
tions in the decision rule as a function 
of various independent variables but 
in quite different ways. For signal 
detection theory the subject selects a 
criterion in terms of certain game- 
theoretic considerations that take into 
account a priori probabilities of stim- 
uli and the costs associated with the 
various choices. Once the criterion 
has been selected for a given ex- 
perimental condition it is assumed to 
be relatively fixed, and consequently 
there is no possibility for predicting 
trial-by-trial sequential effects. In 
contrast, for the present theory, the 
decision process changes from trial to 
trial as a function of the type of in- 
formation that accrues to the subject. 

In discussing the decision rule it is 
important to realize that we have 
placed a heavy emphasis on a learning 
process associated with stimuli ex- 
traneous to the signal source (i.e., 
background cues). This learning 
process plays a central role in deter- 
mining the values of pı and pz as a 
function of various independent vari- 
ables and provides one means of 
accounting for sequential effects in 
psychophysical data. It should be 
emphasized that the sequential results 
predicted by Equation 13 are due 
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entirely to trial-to-trial changes in the 
conditioning of stimuli in the back- 
ground set S*. Another source of 
sequential variability can arise from 
trial-to-trial fluctuations in m,n. 
When the scan range, &, is large these 
effects are negligible at asymptote; 
however, for small values of ¢ they 
can be quite important. As indicated 
earlier, we have obtained good ac- 
counts of sequential effects for several 
sets of data by assuming that the 
scan range is large. Further, when 
¢— œ the mathematical analysis is 
simplified. It is for these reasons that 
we have been willing to begin by 
making this assumption. 

Without actually estimating the 
value of £ one can obtain various 
crude, but easily calculated, measures 
of trial-to-trial fluctuations in sensi- 
tivity (as opposed to the long term 
changes in sensitivity level described 
by Equation 6). As an example, 
let C, and C,, denote correct (Si — A1 
or Sa — As) and incorrect (Si — A2 
or S: — A1) responses on Trial n, re- 
spectively. Then in a forced-choice 
experiment in which y = 1/2, the 
theory in general predicts that 


Pr(Cn4i|Cn) # Pr(CayilCn) [14] 


except when §— ».4 If over an ex- 
tended series of trials estimates of 
these two probabilities are signifi- 
cantly different, then it will be neces- 
sary to take into account not only 
long-term changes in sensitivity level 
but also the more local effects. In 
this regard, it should be pointed out 
that any theory of signal detection 
that postulates a static activation 
process has as a consequence the 
prediction that 


Pr (Caril Cu) = Pr(Cni|Cn) 


4 It should be emphasized that the predic- 
tion in Equation 14 does not depend on the 
value of £ but only on the fact that M, = M: 
and y = 1/2. 
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in a forced-choice experiment with 
y = 1/2; this result holds for both a 
correct information procedure and a 
no information procedure. 

Our presentation of the theory has 
dealt with experimental situations in 
which the subject is given correct 
information on each trial regarding 
the appropriate response; i.e., 


Pr(Ei,n|Si,n) = Pr(E2,n|S2,n) = 1 


It is obvious that the axioms, as 
stated, are directly applicable to 
problems in which the experimenter 
may give false information on some 
trials. We shall not go into the 
predictions for this type of experiment 
except to say that the theory gives a 
good account, at least at the qualita- 
tive level, of the findings reported by 
Carterette and Wyman (1963) and 
Suppes and Krasne (1961) on detec- 
tion problems in which incorrect in- 
formation was manipulated as an 
experimental variable. 

Throughout this paper, we have con- 
sidered psychophysical methods in 
which the subject is given information 
on each trial and have not dealt with 
the no information case. Under con- 
ditions of no information certain 
changes need to be made in Axioms 
A3 and L2. A discussion of this ver- 
sion of the theory is given in Atkinson 
(1963) and Atkinson and Estes (1963) 
and applied to some forced-choice 
visual detection data involving 10 
information feedback; the detailed 
predictions for both asymptotic re- 
sponse proportions and first-order 
sequential statistics are excellent. 
However, the major difficulty with 
the no information condition is that it 
makes the mathematical predictions 
less manageable and increases the 
sampling error associated with pa- 
rameter estimates. Thus, within the 
present theoretical framework, the 
study of the no information case 
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warrants only limited investigation 
until the less complicated cases have 
been adequately explored. 

There are a number of special topics 
that have not been discussed. The 
following are of particular interest: 
the effect of blank trials in a forced- 
choice procedure; extension of the 
model to account for choice-time 
measures; and extension of the model 
to multiinterval forced-choice experi- 
ments where second choices are per- 
mitted. These problems can be 
formulated in a natural way within 
the framework of the theory and will 
be treated in later papers. 


SUMMARY 


In this paper we present an analysis 
of both yes-no and forced-choice ex- 
periments in terms of a two-process 
model. One process describes sys- 
tematic changes that may occur over 
time in the subject’s sensitivity level 
to external stimuli; the other process 
specifies changes in the subject's 
decision rule as information accrues 
to him. From the theory one can 
derive predictions regarding both 
gross statistics like receiver-operating- 
characteristic curves and detailed 
sequential statistics like autocorrela- 
tions based on previous stimulus- 
response events. 

Most theories of signal detection 
assume that the subject’s decision 
rule changes as a function of in- 
structions, payoffs, stimulus presenta- 
tion schedules, and other experimental 
variables, but to our knowledge the 
present paper is the first to examine 
the implications of postulating sys- 
tematic nonrandom changes in sensi- 
tivity. Undoubtedly the detailed 
features of the axioms describing 
changes in sensitivity are going to 
need much revision to provide a broad 


base for interpreting psychophysical 
phenomena. Nevertheless, it seems 
clear that by assuming a variable 
sensitivity level one can provide a 
highly parsimonious account of a 
wide array of phenomena. No sug- 
gestions have been offered regarding 
the mechanism that might account 
for changes in sensitivity (e.g., orient- 
ing responses, peripheral changes 
within the sensory system, or events 
presumed to occur at higher centers) 
and future exploration of the concept 
may require such specificity. 

Another unique aspect of the pres- 
ent development is its emphasis on 
sequential phenomena. These effects 
can be easily estimated in most 
experiments and represent a source of 
information about detection behavior 
that cannot be duplicated by an 
analysis of gross statistics like the 
proportion of hits or false alarms. 
Within the present theory, sequential 
effects are accounted for in terms of 
trial-by-trial fluctuations in both the 
decision rule and the sensitivity level. 
Predictions regarding sequential phe- 
nomena play a crucial role in evaluat- 
ing the theory. In the past, most 
investigators either have ignored these 
sequential effects or treated them as 
experimental artifacts to be minimized 
by counterbalancing, trial spacing, or 
by the use of trained subjects. 

Much research is needed to test the 
general class of models suggested by 
the theory. However, in our opinion, 
there is enough evidence already 
available to suggest that the concept 
of a variable sensitivity level will be a 
necessary ingredient of a compre- 
hensive theory of detection behavior. 
Also, it is hoped that the present 
paper has emphasized the importance 
of examining trial-by-trial sequential 
phenomena as a source of information 
about the perceptual process. 
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THEORETICAL NOTES 


FURTHER CONSIDERATIONS ON TESTING THE NULL 


HYPOTHESIS AND THE STRATEGY AND TACTICS 
OF INVESTIGATING THEORETICAL MODELS 


ARNOLD BINDER 
Indiana University 


David A. Grant has argued that it is inappropriate to design experi- 
ments such that support for a theory comes from acceptance of the null 
hypothesis. The present article points out that while this position could 
be defended in Fisher's approach to testing statistical hypotheses, it 
could not in the Neyman-Pearson approach or on more general scien- 
tific grounds. It is emphasized that one optimally designs experiments 
with enough sensitivity for rejecting poor theories and accepting useful 
theories, whether acceptance or rejection of the null hypothesis leads 
to empirical support. The argument that, in the procedure to which 
Grant objects, an insensitive experiment is more likely to lead to sup- 
port for a theory is shown to be only a special case of the argument 


against bad experimentation. 


The arguments in a recent article by 
Grant (1962) are directed against ex- 
perimental designs oriented toward ac- 
ceptance of the null hypothesis, that is, 
where support for an empirical hypothesis 
depends upon acceptance of the null hy- 
pothesis. Atkinson and Suppes (1958) 
provide an excellent example of the type 
of experimental logic to which Grant ob- 
jects. These investigators postulated a 
one-stage Markov model for a zero-sum, 
two-person game. On the basis of the 
model taey predicted, first, the mean pro- 
portion of various responses over asymp- 
totic trials and, second, that the prob- 
ability of State k given States i and jon 
the two previous trials is equal to the 
probability of State k given only State j 
on the immediately preceding trial (i.e, 
that a one-stage Markov model accounts 
for the data). The predictions were then 
compared with the obtained results by 
means of a series of ¢ tests, in the former 
case, and a x? test, in the latter. One of 
the ¢ tests, for example, involved a com- 
parison of the predicted proportion of 
.600 against the observed mean propor- 
tion of .605, while another a comparison 
of a predicted value of .667 and an ob- 


served value of .670. Support for the 
one-stage Markov model was then in- 
ferred by the failure of the ¢ tests and the 
x? to reach the .05 level of significance. 
That is, support for the empirical model 
came from acceptance of the null hypothe- 
ses. Other examples may be found in 
Binder and Feldman, 1960; Bower, 1962; 
Brody, 1958; Bush and Mosteller, 1955; 
Grant and Norris, 1946; Harrow and 
Friedman, 1958; Weinstock, 1958; and 
Witte, 1959. 

To facilitate future discussion it is con- 
venient to refer to the procedure where 
acceptance of the null hypothesis leads to 
support for an empirical hypothesis as 
acceptance-support (a-s), and to the pro- 
cedure where empirical support comes 
from rejection of the null hypothesis as 
rejection-support (r-s). 

In addition to the objections to a-s, 
Grant argues that the method of testing 
statistical hypotheses may not be a very 
good idea in any case. He thus argues 
it is wise to shift away from the current 
emphasis in psychological research on 
hypothesis testing in the direction of sta- 
tistical estimation. 


107 


108 


STATISTICAL LOGIC 


There have been two principal schools 
of thought in regard to the logical and 
procedural ramifications of statistical in- 
ference. The older of these stems from 
the writings of Yule, Karl Pearson, and 
Fisher, while the other comes from the 
early work of Neyman and Pearson and the 
more recent developments of Wald. The 
respective influences of each of these 
schools on experimental statistics is 
abundantly evident, but a difficulty in 
separating these influences is that the 
actual recommendations for tests and in- 
terval estimates in a field like psychology 
are similar for both. 

In the Fisher school one starts the 
testing process with a hypothesis, called 
the “null hypothesis,” which states that 
the sample at issue comes from a hypo- 
thetical population with a sampling dis- 
tribution in a certain known class. Using 
this distribution one rejects the null hy- 
pothesis whenever the discrepancy be- 
tween the statistic and the relevant 
parameter of the distribution of interest 
is so large that the probability of obtain- 
ing that discrepancy or a larger one is 
less than the quantity designated « (the 
significance level). No clear statement 
is provided for the manner in which the 
null hypothesis is chosen, but the tests 
with which Fisher (1949) has been as- 
sociated are in the form where the null 
hypothesis is equated with the statement 
“the phenomenon to be demonstrated is 
in fact absent” (p. 13). 

The concept “rejection of the null hy- 
pothesis” is therefore unambiguous in the 
context of Fisher’s viewpoint, but what 
about “acceptance of the null hypothesis?” 
Fisher (1949) provides the following 
statement “the null hypothesis is never 
proved or established, but is possibly 
disproved, in the course of experimenta- 
tion. Every experiment may be said to 
exist only in order to give the facts a 
chance of disproving the null hypothesis” 
(p. 16). This is not very edifying since 
one does not expect to prove any hypothe- 
sis by the methods of probabilistic infer- 
ence. Hogben (1957) has interpreted 
these and similar statements of the Yule- 
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Fisher group to mean that a test of sig- 
nificance can lead to one of two decisions: 
the null hypothesis is rejected at the « 
level or judgment is reserved in the ab- 
sence of sufficient basis for rejecting the 
null hypothesis. 

Papers by Neyman and Pearson (1928a, 
1928b) pointed out that the choice of a 
statistical test must involve consideration 
of alternative hypotheses as well as the 
hypothesis of central concern, They in- 
troduced the distinction between the error 
of falsely rejecting the null hypothesis 
and the error of falsely accepting it (re- 
jecting its alternative). Neyman and 
Pearson’s (1933) general theory of hy- 
pothesis testing, based on the concepts 
Type I error, Type II error, power, and 
critical region, was presented later. 

The possible parameters for the distri- 
bution of the random variable or variables 
in a given investigation are conceptually 
represented by a set of points in what is 
called a parameter space. This space is 
considered to be divided into two or more 
subsets, but we shall restrict our present 
discussion to the classical case in which 
there are exactly two subsets of points. 

The statistical hypothesis specifies that 
the parameter point lies in a particular 
one of these two subsets while the alterna- 
tive hypothesis specifies the other subset 
for the point. A statistical test is a pro- 
cedure for deciding, on the basis of a set 
of observations, whether to accept or re- 
ject the hypothesis. Acceptance of the 
hypothesis is precisely the same as de- 
ciding that the parameter point lies in the 
set encompassed by the hypothesis, while 
rejection of the hypothesis is deciding 
that the point lies in the other subset. A 
typical test procedure assigns to each pos- 
sible value of the random variable (sta- 
tistic) one of the two possible decisions. 

Sets of distributions (or their associ- 
ated parameters), in this mathematical 
model, may be considered to correspond 
to the explanations in the empirical world 
which may account for the possible out- 
comes of a given experiment, Empirical 
hypotheses, which specify values or re- 
lationships in the scientific world, are 
translatable om this basis into statistical 


THEORETICAL NOTES 109 


hypotheses. But the distinction between 
empirical and statistical hypotheses is 
quite important: the former refer to sci- 
entific results and relationships, the latter 
to subsets of points in a parameter space; 
they are related by a set of correspon- 
dences between scientific events and 
parameter sets. 

The term “null hypothesis” does not oc- 
cur in the writings of many of the ad- 
vocates of the Neyman-Pearson view. 
Except for one pejorative footnote I was 
unable to find the term used by Neyman 
(1942), for example, in any of an ex- 
tensive array of his publications. In 
general, these people prefer the term 
“statistical hypothesis” or simply “hypoth- 
esis” in designating the subset of central 
concern and alternative hypothesis for 
the other subset. Howeyer, null hypothe- 
sis has taken on meaning over the years 
in the context of the Neyman-Pearson 
tradition among many writers of sta- 
tistics, particularly those with expository 
proclivities. In the Dictionary of Statisti- 
cal Terms (Kendall & Buckland, 1957) 
we find the following definition for null 
hypothesis: “In general, this term relates 
to a particular hypothesis under test, as 
distinct from the alternative hypotheses 
which are under consideration. It is 
therefore the hypothesis which determines 
the Type 1 Error” (p. 202). 


An EVALUATION 1 


Grant’s position in regard to a-s is cer- 
tainly not new or novel since it has been 
implicit in the writings of Fisher for the 


1 There is a third viewpoint, represented in 
the psychological literature by Rozeboom’s 
(1960) recent -article, from which Grant's 
Position could be evaluated. This viewpoint 
emphasizes the importance of the a posteriori 
Probabilities of alternative explanations, in 
the Bayes sense, rather than the decision as- 
pects of experimentation. However, the 
philosophical and practical problems of this 
approach remain enormous as is evident in 
the debates on this and related topics over 
the years. See, for example, Jeffreys (1957), 
Neyman (1952), Hogben (1957), Sav- 
age (1954), Chernoff and Moses (1959), 
von Mises (1942, 1957), and particularly 
Parzen (1960) who discusses the dangers of 


past 25 years. Moreover, it has been part 
of the folklore of statistical advising in 
psychology at least as far back as my ini- 
tial exposure to psychological statistics 
(see Footnote 2). And, in fact, if Grant 
wishes to argue that his position holds 
only in the very narrowest interpretation 
of the Yule-Karl Pearson-Fisher struc- 
ture, I see no grounds for contesting it. 
If there are only two possible decisions 
—reject the null hypothesis or reserve 
judgment—one would surely not wish to 
equate the null hypothesis with the empir- 
ical hypothesis designating a specific 
value. Using this logic an investigator 
could just as well discard as retain a 
theory when it has led to perfect predic- 
tions over a wide range. 

In this context I would like to point out 
that there are many logical difficulties con- 
nected with the Fisher formulations which 
have been brought out dramatically in 
years of debate (Fisher, 1935, 1950, 1955, 
1959, 1960; Neyman, 1942, 1952, 1956, 
1961). Moreover there are some people 
who, while generally sympathetic with 
the Fisher viewpoint, are quite willing to 
accept the null hypothesis and conclude 


using Bayesian inverse probability in applied 
problems. It is typically not the case in 
basic research that one can assume that an 
unknown parameter is a random variable 
with some specified a priori distribution, and 
in such cases this approach does not presently 
provide any adequate answers to the prob- 
lems of hypothesis evaluation. 

While of a markedly different philosophi- 
cal persuasion than the present writer, Roze- 
boom (1960) is equally unsympathetic with 
the inferential bias represented by Grant. 
He cuts into an essential component of this 
bias in the following succinct and effective 
manner, 


Although many persons would like to con- 
ceive NHD [the null hypothesis decision 
procedure] testing to authorize only re- 
jection of the hypothesis, not, in addition, 
its acceptance when the test statistic fails 
to fall in the rejection region, if failure to 
reject were not taken as grounds for ac- 
ceptance, then NHD procedure would in- 
volve no Type II error, and no justifica- 
tion would be given for taking the rejection 
region at the extremes of the distribution, 
rather than in its middle (p. 419). 
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that this provides support for an empirical 
hypothesis (Mather, 1943; Snedecor, 
1956). 

In the pursuit of evaluating Grant's 
position from the Neyman-Pearson the- 
ory we must remember that the null hy- 
pothesis is a statistical hypothesis which 
designates a particular subset of param- 
eter points, Moreover, the null hypothe- 
sis and the alternative hypothesis (the 
other subset) are mutually exhaustive so 
that rejection of the one implies accept- 
ance of the other; acceptance of a hy- 
pothesis being the belief, at a certain 
probability level, that the subset specified 
by the hypothesis includes the parameter 
point. There can be no question about 
the legitimacy or acceptability of accept- 
ance of the null hypothesis within this 
purely mathematical scheme since accept- 
ance and rejection are perfectly comple- 
mentary. 

Consequently any interpretive dificul- 
ties which result from accepting the null 
hypothesis must be in the rules for or 
manner of relating empirical and statisti- 
cal (null) hypotheses. The null hypoth- 
esis is of course that hypothesis for which 
the probability of erroneous rejection is 
fixed at a (or set at a maximum of a); 
the test (critical region) is chosen so as 
to maximize power for the given « and 
the alternative hypothesis. Since therein 
lies the only feature of the process that 
differentiates the null hypothesis from the 
other subset, the relating of empirical and 
statistical hypotheses must be based upon 
it. 

While there are no firm rules for de- 
ciding with which of the two subsets a 
given empirical hypothesis should be as- 
sociated, there have been certain prac- 
tices or conventions used by different 
writers. Neyman (1942), for example, 
suggested a most reasonable convention 
for relating empirical and statistical hy- 
potheses which is to equate with the null 
hypothesis that empirical hypothesis for 
which the error of erroneous rejection is 
more serious than the error of erroneous 
acceptance so that the more important 
error is under the direct control of the 

experimenter. There are a few other con- 
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ventions based upon the derivational ad- 


vantages of fixing a for a simple (rather — 


than a composite) hypothesis, but it is 
quite clear that Grant has not merely re- 
stated any of these, In fact, Grant's 
(1962) strong statement that “using these 
predictions as the values in H, [the null 
hypothesis] is tactically inappropriate, 
frustrating, and self-defeating,” (p. 61) 
indicates that his position is much more 
than a convention of convenience. 

The position which T will develop over 
the remainder of this paper is not that 
a-s is preferable to r-s, but that there 
are no sound foundations for damning a-s. 
In this process let me initially point out 
that one can be led astray unless he recog- 
nizes that when one tests a point predic- 
tion he usually knows before the first 
sample element is drawn that his empiri- 
cal hypothesis is not precisely true. Con- 
sider testing the hypothesis that two 
groups differ in means by some specified 
amount. We might test the hypothesis 
that the difference in means is 0, or per- 
haps 12, or perhaps even 122.5. But in 
each case we are certain that the differ- 
ence is not precisely 0.0000 . . . ad inf., 
or 12.0000 . . ., or 122,50000 . . . ad inf. 

Recognition of this state of affairs 
leads to thinking in terms of differences 
or deviations that are or are not of im- 
portance for a given stage of theory con- 
struction or of application. Some express 
this in terms of differences which do and 
do not have practical importance, but I 
prefer the term zone of indifference which 
is used with important implications in 
sequential analysis, That is, if, for ex- 
ample, the difference in mean perform- 
ance between two groups is less than, say; 
e the two means may be considered equiva- 
lent for the given stage of theoretical de- 
velopment. In the case of a prediction of 
one-third for the proportion of right turns 


- 


of rats in a maze, one would expect the — 


same courses of action to be followed if 
the figure were actually 334 or .335. 
Thus, although we may specify a point 
null hypothesis for the purpose of our 
statistical test, we do recognize a more OF 
less broad indifference zone about the 


null hypothesis consisting of values which — 
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are essentially equivalent to the null hy- 
pothesis for our present theory or practice. 
While the formal procedures for test- 
ing statistical hypotheses are based upon 
| the assumption that the sample size (») 
is fixed prior to consideration of alterna- 
tive test procedures, the user of statisti- 
tal techniques is faced with the problem 
of choosing » and does so with regard 
for the magnitude of the discriminations 
Which are or are not i for his 
particular application or level of theory 
development. In the typical case we 
_ choose the conditions of experimentation, 
including sample size, such that we will 
~ reject the null hypothesis with a given 
_ probability when the parameter difference 
_ is acertain magnitude. This is frequently 
done very formally in fields like agricul- 
ture, although rather informally in psy- 
chology. For example, in Cochran and 
Cox (1957) there is an extended dis- 
_ cussion of the procedures for choosing 
_ the number of replications for an experi- 
< ment on the basis of the practical im- 
portance of true differences. Thus, in 
one of their examples, a difference of 20% 
of the mean of two values is considered 
sufficiently important to warrant a sensi- 
tive enough experiment to have an .80 
‘Probability of detecting it; that is, if the 
difference is 20% a large enough n is 
desired to insure that the power of the 
test is .80. Although it may happen that 
the required sample size is a function of 
an unknown distribution and not deter- 
minable in advance, it can usually be ap- 
proximated with the tests used most fre- 
quently by psychologists. 

The choice of sample size is but one 
feature in the overall planning to obtain 
an experiment of the desired precision 
‘with due consideration for the level of 
theory development (including alternate 
theories), the zones of indifference, and 
the related consequences of decision. 
However, such other features as the 
= Standard error per unit observation and 
the design efficiency do not have the 
flexibility of sample size, and, moreover, 
are usually chosen to maximize precision 
for reasons of economy. The choice of 
Optimum sample size applies to all ex- 
+ 


= 
«+ 
“J 


perimental strategies, including the non- 
objectionable (to Grant) and more usal 
rs. Ut is surely apparent that anyone 
who wants to obtain a significant differ- 

badly enough can obtain 
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a useful, though inaccurate theory; in r-s 


though inaccurate theory (that is, to ac- 
cept the null hypothesis which implies re- 
jection of its alternative). The identical 
terms were chosen in the preceding sen- 


sirability of a test that is neither too 
stringent nor too insensitive. Whether 
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or not the experiment is precise enough 
is, then, a function of theoretical and 
practical consequences, and not of whether 
acceptance or rejection of the null hy- 
pothesis leads to support for an empirical 
theory. 

But, one may argue, while there is logi- 
cal equivalence as stated above, there is 
not motivational equivalence. That is, 
while it is agreed that ideally investiga- 
tors design their experiments (including 
their choice of sample sizes) in order to 
be reasonably certain of detecting only 
differences which are of practical or theo- 
retical importance, in actual practice they 
are neither so wise nor so pure as to be 
influenced by these factors to the exclu- 
sion of social motivations. And it is in- 
deed much easier to do insensitive rather 
than precise experimentation. This phe- 
nomenon is of course what Grant (1962) 
referred to in his statement, 


The tactics of accepting Hə as proof and re- 
jecting Hə as disproof of a theory lead to 
the anomalous results that a small-scale, 
insensitive experiment will most often be 
interpreted as favoring a theory, whereas a 
large-scale, sensitive experiment will usually 
yield results opposed to the theory! (p. 56). 


Perhaps that reflects the essential point 
of Grant’s presentation—merely to cau- 
tion imprudent experimenters that the 
combination of personal desire to es- 
tablish one’s hypothesis and the ease 
of performing insensitive experimenta- 
tion produce a particularly troublesome 
interaction. 

Before proceeding it should be remem- 
bered that scientific considerations may 
be made secondary to personal desires to 
establish a theory whether the procedure 
be a-s or r-s in a perfectly analogous 
fashion. The only difference involves 
such practical considerations as the fact 
that it is usually easier to run 5 or 10 
subjects than 100 or 500. 

If Grant (1962) merely intended his 
article to convey this obvious warning, I 
cannot understand the discussions which 
involve such statements as the following: 


Unfortunately most of the procedures used 
to date in testing the adequacy of such theo- 
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retical predictions [from mathematical 
models] set rather bad examples. Probably 
the least adequate of these procedures has 
been that in which an Hə of exact cor- 
respondence between theoretical and empiri- 
cal points is tested against H, covering any 
discrepancy between predictions and experi- 
mental results (p. 55). 


If one is pointing out the dangers of using 
insensitive a-s tests (rather than con- 
demning a-s on logical grounds), one 
would be expected to object to a particular 
or general use of a-s only if the use in- 
volved insensitive tests. Thus, it might 
be argued that experimenter WKE ob- 
tained support for his quantitative pre- 
diction by the use of a-s with a test so 
insensitive that it could not reasonably 
detect important discrepancies between 
predictions and observations. Or, as an- 
other fictional example, it might be stated 
that RRB always used a-s and always 
found support for his linear models, but 
his n was uniformly less than 5. But, 
unless there were almost uniform use of 
insensitive tests with a-s, this cautionary 
position could not reasonably lead to a 
condemnation of a-s. 

As I see it, moreover, the argument 
against insensitive a-s tests is nothing but 
a particular form of the more general 
argument against bad experimentation. 
It is unquestionably the case that an a-s 
experiment that is too small and insensi- 
tive is poor, but the poorness is a property 
of the insensitivity and not of the a-s pro- 
cedure. An r-s experiment that is too 
small and insensitive is equally poor. Due 
to the interaction between personal 
achievement desires and the ease of sloppy 
experimentation, as referred to previ- 
ously, it may be necessary to be particu- 
larly alert to the usual scientific safe- 
guards when using a-s, but that is a trivial 
matter and hardly worthy of an article. 

In summary, it would be perfectly jus- 
tifiable to argue that n is too small (or 
even too large) for a particular degree 
of sensitivity required at a given level of 
scientific development, but that is fat 
from a proscription of designs where ac- 
ceptance of the null hypothesis is in some 
way to the experimenter’s social or per- 
sonal advantage. 
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GRANT'S POSITION FROM THE VIEWPOINT 
OF SCIENTIFIC DEVELOPMENT 


In the process of concluding this dis- 
cussion I would like to emphasize and 
expand on certain factors which seem 
most critical in the process of evaluating 
scientific theories, as well as to indicate 
that my objections to Grant’s apparent 
position are justifiable beyond the con- 
fines of the Neyman-Pearson theory. 

It is surely clear that at various phases 
in the development of a scientific field 
one is faced with the problem of deciding 
about the suitability of different theories. 
When a discipline is at an early stage of 
development, knowledge of empirical re- 
lationships is crude so that broad isola- 
tion of explanatory constructs may be the 
most that is obtainable. At this stage one 
might consider as a significant accom- 
plishment the ruling out of the hypothe- 
sis that observed differences are chance 
phenomena. The empirical hypothesis of 
central concern would be that there is 
some relationship of unknown magnitude, 
while its alternative would be the chance 
or noise explanation. 

With increasing sophistication in the 
discipline the alternative hypotheses may 
represent different, but more or less 
equally well-developed theories. One does 
not choose between theory and chance, 
but between theory and theory or between 
theory and theories. Another aspect of 
increased sophistication is frequently the 
greater precision in the prediction of 
empirical results for the various theories. 

The decision as to which of the the- 
ories is admissible on the basis of the 
available data may be accomplished di- 
rectly within the Neyman-Pearson frame- 
work, but that is not necessarily the case. 
Sometimes the choice among theories de- 
pends upon a succession of tests of hy- 
potheses or possibly even upon quite in- 
formal considerations; as an example of 
the latter, one theory may lead to a pre- 
diction which is perfectly in accord 
(within rounding errors) with the ob- 
servations while the other theory is off 
by quite a margin—a statistical test 
would be considered foolish indeed. In 
disciplines that have markedly smaller ob- 


servational variability than psychology 
the most common procedure consists of 
a subjective comparison between predic- 
tions and observations, Moreover, the 
point that one chooses among alternative 
hypotheses at various stages of scientific 
development (whether by statistical meth- 
ods or otherwise) most certainly does not 
imply that his efforts stop once he has 
accepted or rejected a given hypothesis 
as Grant implies; if the accepted theory, 
for example, is of any interest he pro- 
ceeds to make finer analyses and compari- 
sons which may range from orthogonal 
subcomparisons in the analysis of vari- 
ance to intuitive rumination. This pro- 
vides a basis for objecting to Grant's 
arguments to the effect that hypothesis 
testing should be replaced (not supple- 
mented) by estimation. The point is that 
both are usable, but at different phases of 
investigation. 

I will again refer to the Atkinson and 
Suppes (1958) experiment to illustrate 
the relative roles of hypothesis testing 
and subsequent analysis in scientific ad- 
vancement. Their first strategy was to 
decide which of two theories—game the- 
ory or the Markov model—was most ade- 
quate in the given experimental context. 
This clearly was a problem of testing hy- 
potheses; a choice had to be made and 
the procedures of estimation could at best 
provide a substage on the way to the 
decision. The Markov model was ac- 
cepted and game theory rejected, as noted 
above, but this certainly did not lead to a 
cessation of activity. Instead the investi- 
gators initially compared theoretical and 
observed transition matrices (and found 
them distinctly different), they then tested 
the more specific hypothesis of a one-stage 
Markov model against the alternative of 
a two-stage model, and finally they in- 
vestigated the stationarity of the Markov 
process. 

During its early phases, Einstein’s gen- 
eral theory of relativity was equivalent 
to Newtonian theory in the success of ex- 
plaining various common phenomena and 
a choice between them could not be made. 
But the Einstein theory led to certain 
predictions differing from Newtonian and 
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these in turn led to a series of “crucial” 
tests. Among these were the exact pre- 
dictions as to the magnitude of the bend- 
ing of a light ray from a star by the 
gravitational field of the sun and the shift 
of wavelength of light emitted from atoms 
at the surface of stars. The general the- 
ory of relativity, thus, led to predictions 
which differed from the predictions of 
the alternative theory (Newton's), and 
the ultimate correspondence between these 
predictions and empirical observations 
(acceptance of no difference between pre- 
dicted and obtained results) led to support 
for general relativity. While agreements 
between theory and observational results 
have been close they certainly have not 
been perfect—even physicists have prob- 
lems of measurement precision and intri- 
cacy of mathematical derivation. But to 
the best judgment of the scientists the 
closeness of the fit between predictions 
and observations warrants the conclu- 
sion that the data provide support for the 
theory. Surely, however, despite its tre- 
mendous power, physicists do not claim 
that Einstein’s general theory has been 
proved nor are they convinced that it will 
not be ultimately replaced by a better 
theory. 

It does not seem reasonable to argue 
that this method of scientific procedure 
is not suitable for psychology—just be- 
cause our measurement precision happens 
to be lower than in physics and we use 
statistical tests rather than purely obser- 
vational comparison. 
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A NOTE ON THE EFFECTS OF SCORE TRANSFORMATIONS 
IN Q AND R FACTOR ANALYSIS TECHNIQUES * 


CRAIG MacANDREW AND EDWARD FORGY 
School of Medicine, University of California, Los Angeles 


In a recent article in this journal ad- 
dressed to the effects of score transfor- 
mations in Q and R factor analysis tech- 
niques, Broverman (1961) claims to have 
presented “. . . new arguments and em- 
pirical evidence of important differences 
not previously recognized between the 
two techniques as they are commonly 
employed.” These differences are said 
to include “differences in results, differ- 
ences in the type of inference possible 
from the results, and consequently differ- 
ences in the theoretical implications of 
the data’—in a word, “differences... 
of the greatest magnitude for research in 
psychology.” 

The present note is intended to demon- 
strate that the differences Broverman ob- 
tained from his Q and R factor analyses 
do not constitute the empirical warrant 
for such claims. It will be shown that 
when Broverman’s own data are rean- 
alysed and the appropriate comparisons 
are made, a clear and understandable re- 
lationship emerges between the results 
of Q and R factor analyses “as they are 
commonly employed.” 


BrovERMAN’sS REASONING 


Broverman correctly observes that 
Burt’s (1937) proof of the exact trans- 
posability of person and test factors does 
not apply to Q and R factor analyses “as 
they are commonly employed.” He goes 
far beyond this, however, when he asserts 
that factor solutions obtained from a Q 
analysis (correlating people) are not even 


1 This work was supported in part by the 
Division of Alcoholic Rehabilitation, Cali- 
fornia State Department of Public Health, 
under Contract No. 100. The data analy- 
ses were conducted on the facilities of the 
Western Data Processing Center, Graduate 
School of Business Administration, Univer- 
sity of California, Los Angeles. 


approximate transpositions of factors pro- 
duced by an R analysis (correlating 
tests). Before proceeding, we would 
note that Broverman tended to use the 
terms “standardized” and “centered” in- 
terchangeably. In what follows, we shall 
maintain the convention wherein the term 
standardized designates the case in which 
both the means and the standard devia- 
tions of a series of columns or rows are 
equated. The term centered, which will 
not enter into the present discussion, 
would be reserved for the case in which 
only the means are equated. 

Broyerman’s empirical “demonstration” 
of this lack of transposability focuses on 
the different R analysis results obtained 
from a matrix of data in raw score form 
and the same matrix of data first stand- 
ardized by columns then by rows, i.e., ap- 
proximately doubly standardized. His 
reasoning is as follows: (a) A raw data 
matrix is the type upon which R analyses 
are customarily performed. (b) A doubly 
standardized matrix yields transposable 
test and person factors. (c) Q factors 
from a column-then-row standardized data 
matrix are identical to Q factors obtained 
from the more usual column standardized 
matrix by virtue of the fact that, in the 
latter case, the process of correlating rows 
accomplishes, in effect, the row standardi- 
zation of the already column standardized 
data. (d) Thus if the pattern of R factor 
loadings obtained from a raw data matrix 
can be shown to be markedly different 
from the pattern of R factor loadings ob- 
tained from such an approximately doubly 
standardized matrix, then because double 
standardization “. . . radically alters the 
R factor solution . . .” it follows from the 
above points that“... R and Q analysis 
techniques, as commonly performed, at© 
different and give dissimilar results.” 
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TABLE 1 
R FACTOR LOADINGS OBTAINED FROM BROVERMAN’'S Data By MEANS OF CUMULATIVE 


CLustTer AnaLysis (CCA) anp Principal Components AxaLysıs (PCA) 


Factors obtained from raw data matrix | re Sp pam eee: em 
data matrix 
Body y ‘ ie a a 
meantenens | Boreas | RHA, | wcities | CECI | Poa Pactar 
I 1 i PL E a | A y 

Height 953| —085| 529 | -819| 957| -206| -945 | -916 
Arm length 960 | —005 | 587 | —776 967 | —134 —832 —873 
Foot size 939 139 | 729 | —645 976 060 — 664 | —751 
Neck size 021 910 | 743 591 108 947 584 | 750 
Waist —123 931 | 651 712 | —043 968 865 | 845 
Weight —267 974| 556 820 | —187 977 930 | 951 


Note.—Decimals omitted. 


BroverMAN’s EMPIRICAL EXAMPLE 


In evidence Broverman presents the 
first two R factors which he extracted by 
Tryon’s method of cumulative cluster 
analysis from a matrix of raw data 
(Broverman, 1961, p. 70, Table 1) and 
the first R factor which he extracted, 
again by means of a cumulative cluster 
analysis, from the same data matrix first 
standardized by columns, then by rows, 
ie approximately doubly standardized 
(Broverman, 1961, p. 72, Table 6). 
Broverman’s factors are contained in 
Table 1 of the present paper as Factors 
I and II, and Factor A, respectively. 

It will be observed that while Brover- 
man obtained two well-defined unipolar 
factors from the raw data, the approxi- 
mately doubly standardized data resulted 
in a single bipolar factor. It is this result 
which Broverman takes as the empirical 
warrant for his inferential and theoretical 
elaborations, 


A REANALYSIS OF BROVERMAN’S EXAMPLE 


It is our contention that this result is 
actually due to a peculiarity of Brover- 
man’s chosen method of factor extraction. 
Specifically, as Broverman (1961) him- 
self has noted, Tryson’s method of cumu- 
lative cluster analysis “proceeds directly 
to an approximate simple structure solu- 
tion” (p. 69). That is, the factors ex- 


tracted by this method are already ro- 
tated. 

Our task then is to demonstrate the 
consequences of using rotated rather than 
unrotated factors. For purposes of such 
a demonstration, the present authors con- 
ducted a principal components analysis of 
both of Broverman’s data matrices. The 
results of this reanalysis are contained in 
Table 1 as Factors I, II’, and A’. It will 
be observed that in the present unrotated 
case the raw data resulted in a first factor 
composed entirely of positive loadings 
and in a second factor which was bipolar. 
The approximately doubly standardized 
data resulted in a single bipolar factor. 
Since the primary consequence of double 
standardization is the complete removal 
of the first (g) factor from the data, this 
being a simple arithmetical effect of 
equating persons’ means, it is obviously 
inappropriate to compare our Factor 4 
with our Factor A’. Rather, in that g 
is also removed by the first principal com- 
ponents factor extracted from the raw 
data, the appropriate comparison is be- 
tween our second unrotated principal com- 
ponents factor obtained from the raw 
data (our Factor II’) and our single prin- 
cipal components factor obtained from the 
approximately doubly standardized data 
(our Factor A’). Contrary to Brover- 
man’s result, it will be noted that a marked 
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similarity is evident between these two 
factors. Not only are both factors bipolar, 
but the same variables are positively and 
negatively loaded on each. As an index 
of this pattern similarity, the correlation 
coefficient between these two arrays of 
factor loadings was computed and found 
to be over .999! 

That Broverman’s result is a conse- 
quence of his having used rotated rather 
than unrotated factors is further evi- 
denced by noting that our unrotated prin- 
cipal components Factors I’ and II’ may 
be orthogonally rotated to mirror the ap- 
proximate simple structure solution ex- 
hibited by Broverman’s Factors I and II. 
(By the same token, Broverman’s factors 
could, of course, be orthogonally rotated 
to mirror ours.) The results of a 45° 
orthogonal rotation of our Factors I’ and 
Il’ are presented in Table 1 as Factor I” 
and Factor II”. The degree to which our 
rotated principal components factors 
mirror Broverman’s is indicated by the 
fact that the correlations between Brover- 
man’s Factor I and our Factor I” and 
between Broverman’s Factor II and our 
Factor II” were both greater than .999, 
Clearly, then, the major source of the 
gross dissimilarity between Broverman’s 
two sets of factors was due to (a) his 
having used rotated factors, and (b) his 
having permitted g to be included in the 
rotation. His comparison was inappropri- 
ate because in the approximately doubly 
standardized case g was eliminated from 
the data while in the raw data case it 
was spread out over both factors. We 
thus conclude, in direct contradiction to 
Broverman, that the two R analyses— 
and by implication R and Q analyses— 
provide quite similar “conceptions of hu- 
man functioning” when the appropriate 
factors are compared. 


THEORETICAL NOTES 


We turn finally to Broverman’s more 
general theoretical recommendations. He 
construes the bipolar results which he 
obtained from an “ipsative” score reanaly- 
sis of some data originally analyzed in 
raw score form by Podell and Phillips 
(1959) as suggesting that “certain abili- 
ties tend to be antagonistically related 
within the individual.” From this he 
recommends a theory of “choice point” 
wherein development is viewed as “a 
series of nodal points or forks on the 
road leading to specialization in one class 
of behavior at the expense of another.” 
We have already noted that double stand- 
ardization results in the removal of g 
from the data. We would now point out 
that a necessary consequence of its re- 
moyal is that the first as well as all suc- 
ceeding unrotated factors must be bipolar. 
Further, when all unrotated factors are bi- 
polar, this bipolarity cannot be rotated 
away. We thus submit that while “ipsa- 
tive” scores may have their place in cer- 
tain inquiries, Broverman is in error in 
construing bipolarity per se as substantive 
grounds either for his proposed theory 
of “choice point” or for any other. In 
so doing he is simply elaborating upon 
a statistical artifact. 
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COMMENTS ON THE NOTE BY MacANDREW AND FORGY 


DONALD M. BROVERMAN 
Worcester State Hospital 


MacAndrew and Forgy (1963) have 
pointed out that factors obtained from 
ipsative scores are related to unrotated 
residual factors obtained from raw data. 
This observation is correct, but over- 
simplified. 

For instance, in a matrix containing 
several ipsative factors, the distribution 
of the subjects’ means may correlate with 
a particular ipsative within-individual fac- 
tor. In this case, the covariance of the 
normative variable, reflected by the dis- 
tribution of means, and the covariance of 
the ipsative variable will summate and 
be removed together as the first g factor 
by the usual centroid methods. The ipsa- 
tive factor solution, on the other hand, 
would be unaffected by this correlative 
relationship because ipsatization would 
remove only the variance associated with 
the distribution of means. Hence, the 
ipsative factor analysis solution would 
not correspond precisely to the unrotated 
residual factors obtained from the original 
matrix. 

Also, the magnitudes of loadings on the 
two types of factors need not be similar 
for the following reason: The total pos- 
sible communality within a given matrix 
of scores is fixed. Residual variances 
and loadings, therefore, are necessarily 
small when the first g factor is very large. 
On the other hand, the size of the g 
variable does not affect the magnitudes 
of factor loadings obtained from ipsative 
scores since g is absorbed in each subject's 
mean. Factoring, then, would start fresh 
with a matrix of ipsative deviation scores 
and an untouched reservoir of within-in- 
dividual communality. 

However, except in extreme instances, 
differences between the two types of fac- 
tors are minor and I agree with Mac- 
Andrew and Forgy that once g is re- 
moved, either by ipsatization or by tradi- 
tional methods, factor results will be 
similar. 


MacAndrew and Forgy (1963) have 
also pointed out, correctly, that ipsatiza- 
tion necessarily induces bipolarization. 
Also, the fact that two abilities necessarily 
lie on different sides of their mean does 
not, in of itself, imply a functional or 
antagonistic relationship between the two 
behaviors. Basing a theory of cognition 
on such evidence alone, as MacAndrew 
and Forgy have suggested, would indeed 
be capitalizing on artifact. Any hypothe- 
sis concerning functional relationships be- 
tween two such opposed clusters of be- 
haviors must necessarily come from the- 
ory or evidence external to the ipsative 
factor analysis. 

On the other hand, when a number of 
test clusters are present in a battery, 
there seems no purely mathematical rea- 
son why certain clusters should be consist- 
ently opposed to each other across sam- 
ples. Such results, when they occur, 
would offer empirical support for an a 
priori hypothesis of a functional antago- 
nism between the abilities defined in each 
cluster. In such cases, it might be noted, 
inclusion of interindividual g variance 
via rotation could easily obscure this 
within-individual relationship. 

I have, then, no fundamental disagree- 
ment with the statistical observations 
made by MacAndrew and Forgy (1963). 
We do, however, differ considerably in the 
meaningfulness and importance attached 
to factors obtained from g free, ipsative 
or residual variances. Thus, even where 
bipolarity is neither tenable nor desirable 
as an attribute of cognitive theory, I be- 
lieve an ipsative analysis might be profit- 
ably employed. Each pole of ipsative bi- 
polar factors might then be treated 
separately. The reasoning here is based 
on an assumption that it makes sense to 
distinguish “within-individual” from “be- 
tween-individual” sources of variance, and 
to evaluate each separately as is custom- 
ary in the repeated measurements analy- 
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sis of variance design. Factors obtained 
from within-individual variance refer to 
within-individual consistencies, and, as 
such, permit different inferences than can 
be made from the between-individual con- 
sistencies represented by g alone, or from 
an intermixture of g and residual variance 
as is common in simple structure rota- 
tions. The information concerning nor- 
mative differences between individuals is 
not lost since the subjects may still be 
categorized as “bright” or “dull” depend- 
ing upon their mean performance scores. 
Hence, quite possibly, a greater sensitiv- 
ity to data relationships may be obtained 
by carefully separating these two logically 
distinct realms of variance. 


THEORETICAL NOTES 


MacAndrew and Forgy’s note serves a 
useful function in clarifying the mathe- 
matical relationships between results ob- 
tained from ipsative versus raw data 
scores. Demonstration of mathematical 
relationships between different results, 
however, does not eliminate the differ- 
ences. 
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ERRATA 


In the article by Schachter and Singer, which appeared in 
Psychological Review (1962, 69, 379-399) the following correc- 
tions should be made: 

The superscript “a” should precede the word “All” in the 
footnote to Table 2. 

The superscript “a” should appear next to the column head- 
ing “Initiates” in Table 3. 

The following Tables 6-9 should be substituted for those 
which appeared in print. 


TABLE 6 


Tue EFFECTS OF ATTRIBUTING BODILY STATE 
TO THE INJECTION ON ANGER IN THE 
ANGER EPI IGN CONDITION 


Self-informed subjects 3 —1.67 
Others 20 +2.88 
Self-informed versus p=.05 
Others 
TABLE 7 


THE EFFECTS OF ATTRIBUTING BODILY STATE 
TO THE INJECTION ON EUPHORIA IN 
THE EUPHORIA EPI IGN AND 
Err Mis CONDITIONS 


Epi Ign 
r Activi 
N EDN 
Self-informed subjects 8 11.63 
Others 17 21.14 
Self-informed versus p=.05 y 
Others 
Epi Mis 
TE 
Self-informed subjects 5 12.40 
Others 20 25.10 
Self-informed versus p=.10 
Others 
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TABLE 8 


SYMPATHETIC ACTIVATION AND EUPHORIA 
IN THE EUPHORIA PLACEBO CONDITION 


Subject whose: N SESS, 

Pulse decreased 14 10.67 

Pulse increased 12 23.17 
or remained same 

Pulse decreasers versus p=.02 


pulse increasers or same 


TABLE 9 


SYMPATHETIC ACTIVATION AND ANGER IN 
ANGER PLACEBO CONDITION 


Subjects whose: Na Anner 
Pulse decreased 13 | +0.15 
Pulse increased 8 | +1.69 


or remained same 


Pulse decreasers versus p=.01 
pulse increasers or same 


a N reduced by two cases owing to failure of sound 
system in one case and experimenter’s failure to take 
pulse in another. 
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PSYCHOLOGICAL REVIEW 


IMPRINTING: 


AN EPIGENETIC APPROACH * 
HOWARD MOLTZ?* 
Brooklyn College 


The analysis presented takes an 
departure in speci 


i approach as a point of 


t epigenetic 
fying the manner in which intrinsic and extrinsic 


factors participate in the development and organization of the im- 
printing response in precocial birds. It suggests that during the early 


neonatal 


period,” the response functions to 


gross sensory input provided by the imprinting object so as to make 


it fall within the range necessary to evoke a 


thetically- 


governed organic set and that, thereafter, attachment to the object is 
maintained by a process of selective learning, with fear reduction 


functioning as the reinforcing agent. 


Several experiments suggested 


by the present analysis were discussed. 


In 1873 Spalding (reprint Spalding, 
1954) observed that incubator hatched 
chicks tended to follow the first moving 
object to which they were exposed. 
Heinroth (1910) subsequently called 
attention to this phenomenon when he 
reported that graylag geese can be 
made to respond to humans in filial 
fashion in preference to adults of their 
own species if they are exposed to hu- 
mans immediately after hatching. Sev- 
eral years later Lorenz (1935, 1937) 


1 The research presented in this paper was 
supported by Research Grants M-2417 and 
M-3855 from the National Institutes of 
Health, Public Health Service. 

2Anyone who has read the article by 
T. C. Schneirla (1959) entitled “An Evolu- 
tionary and Developmental Theory of Bi- 
phasic Processes Underlying Approach and 
Withdrawal” will recognize the extent to 
which it has influenced my thinking about 
imprinting. Itis a pleasure to acknowledge 
this indebtedness. Equally worthy of men- 
tion are the patient and thoughtful criticisms 
offered by Evelyn Raskin from which the 
present paper profited immeasurably. 


extended the empirical basis of Hein- 
roth’s observations and, in addition, 
provided a theoretical framework 
within which to interpret these obser- 
vations. Lorenz emphasized that in 
precocial birds a wide variety of either 
animate or inanimate objects can ac- 
quire, in the absence of any conven- 
tional reinforcing agent, the capacity 
to evoke certain aspects of behavior 
that are ordinarily directed toward 
members of the species. 

Since the conditions under which an 
object can acquire this capacity, and 
the characteristics of the behavior thus 
evoked, were considered unique, Lo- 
renz designated the process or mechan- 
ism involved by a special term—im- 
printing. To distinguish imprinting 
from associative learning, he empha- 
sized the following aspects of the im- 
printing process: 

1. Imprinting can be established 
only during a restricted period in the 
bird’s life, a period of short duration 
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presumably corresponding to a critical 
stage of physiological development. 
Thus if a graylag gosling, for example, 
is to respond to a human (or to any 
other relatively large moving object) 
in filial fashion, initial exposure to the 
human must occur within several hours 
after hatching. The effect of this ex- 
posure is at first shown by the fact 
that the gosling persistently follows its 
“surrogate parent.” 

2. Once imprinting occurs its effect 
is irreversible. That is, the stimulus 
to which the bird is exposed during the 
brief critical period was assumed to 
become either the preferred stimulus 
or the only stimulus toward which the 
following-response will be directed. 

3. Many aspects of adult behavior 
that are not functional during the 
period in which imprinting becomes 
established will nonetheless be directed 
toward the “imprinted object” when 
functional status is subsequently 
achieved. Lorenz cited the case of 
adult shell parakeets that directed their 
courtship activities toward humans in 
preference to available members of the 
species as a consequence of their hav- 
ing been exposed to humans early in 
ontogeny. This effect was held to be 
due to imprinting and not conditioning 
since, as Lorenz (1955) put it, “. .. 
you cannot condition any not-yet-func- 
tioning response in the ordinary way” 
(p. 209). 

4. When a bird becomes “im- 
printed” to a particular stimulus, it 
will readily transfer its following-re- 
sponse as well as other social responses 
to all members of the class to which 
that stimulus belongs. In other words, 
imprinting was assumed to result in 
attachment not to the specific features 
of an object but to its general charac- 
teristics. 


Two APPROACHES TO IMPRINTING 


We must begin at this point by em- 
phasizing that there is no experimental 


evidence available to justify Lorenz’s 
confidence that imprinting produces 
irreversible modifications in social and 
sexual behavior. Indeed, all attempts 
to study the imprinting phenomenon 
systematically have tended to concen- 
trate on approach and following be- 
havior in neonatal birds and thus the 
rather casual observations which Lo- 
renz adduced in support of long-term 
effects have not yet been tested. 
Nevertheless, there is still widespread 
agreement with the inference first 
drawn by Lorenz regarding the man- 
ner in which the individual genotype 
is implicated in imprinting. That is, 
the characteristic properties of im- 
printing are thought to be functionally 
represented in an encapsulated set of 
genic determinants which are elabo- 
rated during embryogeny in species- 
typic fashion. The degree of specific- 
ity which this representation is as- 
sumed to involve is exemplified by 
Hess’ (1959a) statement that the crit- 
ical period is “maturationally sched- 
uled”; by Thorpe’s (1956) hypothesis 
that an “innate neurosensory mechan- 
ism operates to release the imprinting 
response”; and by Hinde’s (1961) 
contention that there is an “inherited 
recognition of the characteristics of the 
parents.” Although the embryogenic 
pathways along which these phenotypic 
events are presumed to develop have 
not been made explicit, it would seem 
that such pathways must involve the 
maturation of neural mechanisms that 
are isomorphic with or in functional 
correspondence to particular details of 
the imprinting pattern, In any event, 
most contemporary workers would in- 
sist, in agreement with Lorenz (1955), 
that imprinting has “instinctive 
counterparts” by which they mean 
that imprinting has certain features 
which are directly and specifically pro- 
vided for in the growth process itself. 

In marked contrast to this essen- 
tially nativistic view is the epigenetic 
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conception that the imprinting pattern 
is organized during ontogeny through 
the progressive interaction between the 
developing organism and its sensory 
environment. The idea that there are 
genetic determinants uniquely related 
to imprinting is disavowed and in its 
place is substituted the view that im- 
printing, as well as any other species- 
typic response pattern, arises from the 
integrative influence on development of 
both intraorganic processes and ex- 
trinsic stimulative conditions. 

It should be emphasized that the 
epigenetic approach is not to be con- 
strued, as Hess (1958, 1959a, 1959b) 
has repeatedly done, as an attempt to 
minimize the extent to which endoge- 
nous events participate in the organiza- 
tion of behavior. Certainly it is axi- 
omatic that all behavior at all phyletic 
levels is influenced in complex and di- 
verse ways by such events. But what 
is not self-evident, and indeed what is 
explicitly repudiated by the epigenetic 
approach, is the “template conception” 
of the nativist which maintains that 
species-typie response patterns result 
from the passive translation of genetic 
factors into behavior through the me- 
dium of tissue growth and tissue dif- 
ferentiation. The epigeneticist, to be 
sure, does focus attention on matura- 
tionally determined processes, but he 
does so as part of a basic approach 
which seeks to understand the manner 
in which these processes combine with 
extrinsic stimulative conditions to 
structure and organize temporally-in- 
tegrated behavior. It is within this 
conceptual framework that the epi- 
geneticist would look for an explana- 
tion of the imprinting phenomenon. 


Way IMPRINTING Occurs 


In an article published recently in 
the Psychological Bulletin 1 tried to 
provide an epigenetic analysis of im- 
printing (Moltz, 1960). This analysis 
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attempted to account for the organiza- 
tion of the following-response and for 
the termination of the critical period 
by employing what was essentially a 
classical-instrumental conditioning se- 
quence centering on the arousal and re- 
duction of emotionality. However, 
since the appearance of this article, it 
has become evident that such a se- 
quence by itself would fail to explain 
what is perhaps the most striking fea- 
ture of the imprinting pattern: the 
fact that, when a precocial bird is ini- 
tially introduced to a moving object 
during the latter half of the critical 
period (i.e., at about 13-20 hours after 
hatching), it will often begin to follow 
that object within 60 seconds. It was 
entirely fortuitous that until recently 
most of the birds in our laboratory 
were first presented with the imprint- 
ing object when they were approxi- 
mately 8-10 hours old, and since at 
that age following does not occur with 
dramatic abruptness, we were un- 
aware that this indeed could be the 
case. But it is now quite clear that if 
locomotor and attentive development 
is adequate, close attachment to the 
object can appear within a remark- 
ably short time after initial exposure 
and consequently could not result 
solely from the classical-instrumental 
conditioning sequence that I had ini- 
tially proposed. Although it is diffi- 
cult to specify exactly how long this 
sequence must take, 60 seconds would 
seem insufficient. My earlier explana- 
tion must therefore be either modified 
or dismissed. Since dismissal is the 
more drastic verdict, let us examine 
the possibility of modification. 
Consider the fact that the typical 
imprinting apparatus consists of either 
a circular runway or a rectangular 
alley in which an object is made to 
travel. In our laboratory, a bird be- 
tween 8 and 20 hours of age is in- 
troduced to this apparatus for a period 
of 25 minutes, When initial approach 
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movements are exhibited at this time, 
it is significant to note that they are 
invariably directed toward the object 
as the object retreats but not as it 
moves toward the animal. These early 
movements are made when the bird is 
either sitting or standing still and they 
usually consist of turning the head and 
stretching the neck. During the past 
5 years we have worked with over 
600 birds and I cannot think of a 
single instance in which approach was 
initiated as the object moved toward 
the bird; indeed, when traveling in 
that direction, the object will occasion- 
ally evoke withdrawal responses. 
Perhaps the explanation of this be- 
havioral specificity lies in the differ- 
ence between the type of retinal stim- 
ulation produced by an approaching 
object and by one which retreats. For 
a stationary animal, a retreating visual 
stimulus would engage progressively 
fewer and fewer retinal elements, 
while an approaching stimulus would, 
of course, engage progressively more 
and more such elements. Now, there 
is a great deal of evidence, recently in- 
tegrated in a most impressive manner 
by Schneirla (1959), which indicates 
that for almost all phyletic levels be- 
havior is largely governed in the very 
early ontogenetic stages of develop- 
ment by stimulus intensity rather than 
by stimulus quality, That is to say, 
the absolute level of afferent stimula- 
tion as well as the nature of the 
changes which occur in that level ap- 
pear to be of crucial importance in 
determining whether approach or with- 
drawal responses will be evoked and 
whether concurrent visceral and car- 
diac activities of either a vegetative 
and homeostatic nature or of an in- 
terruptive and emergency nature will 
prevail. In general, the evidence sug- 
gests that low or decreasing levels of 
excitation will selectively elicit ap- 
proach responses accompanied by a 
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parasympathetically governed organic 
set, while more intense excitation will 
selectively elicit withdrawal responses 
and an organic set that is largely a 
function of sympathetic predominance 
and adrenin secretion. Although 
much of the data on which this gen- 
eralization is based are far from satis- 
factory and although exceptions are 
certainly to be found, it appears that 
in the very young animal, autonomic 
discharge and gross skeletal adjust- 
ments are governed by stimulus mag- 
nitudes acting on the central nervous 
system and that, depending on species 
capacities and on the particular inner- 
vational level involved, either an “A- 
type” (parasympathetic) set or a “W- 
type” (sympathetic) set will occur ac- 
companied by approach and withdrawal 
responses, respectively. 

If precocial birds do not constitute 
an exception, it would seem reasonable 
to suggest that the tendency of our 
ducklings to respond initially to the 
imprinting object as it retreats but 
not as it approaches could result from 
the decreasing stimulation which the 
retreating object provides. In other 
words, I am proposing that the initial 
valence of this object is a function of 
its quantitative effects—more specifi- 
cally, the progressive reduction in 
retinal innervation which it periodi- 
cally affords. Moreover, I am pro- 
posing that this reduction in innerva- 
tion not only induces incipient ap- 
proach movements, but also a particu- 
lar constellation of visceral and cardiac 
events, 

Of course, the approach movements 
initially evoked by the retreating ob- 
ject do not remain incipient; indeed, 
if the locomotor capacities of the ani- 
mal permit, they become rapidly trans- 
formed into well-oriented following. 
We have already mentioned that it is 
not unusual for following to occur dur- 
ing the critical period. Perhaps this 
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behavior functions both to keep the 
object centered in the visual field and 
maintained at a relatively fixed dis- 
tance from the animal, thus serving to 
regulate intensity of stimulation. It 
is conceivable that the more or less 
stable excitation level achieved in this 
way is the excitation level needed to 
sustain the organic set that was only 
episodically evoked when the bird was 
not following. In other words, I am 
suggesting that when following does 
occur during the critical period it af- 
fords an energy compromise between 
the positively valenced decreasing 
stimulation from the retreating ob- 
ject, and the either neutrally or nega- 
tively valenced increasing stimulation 
from the approaching object, this com- 
promise being effected at a level of 
excitation required to evoke a con- 
tinuous A-type set. Indeed, it is pre- 
cisely such a resolution which is sug- 
gested by the familiar picture of the 
duckling stretching its neck in the di- 
rection of the retreating object while 
issuing so-called contentment notes 
which may change abruptly to distress 
notes as the object approaches, then 
following sporadically and hesitantly 
while issuing both types of notes, and 
finally following consistently with only 
contentment notes being voiced. We 
can now understand the incipient ap- 
proach movements as being initiated 
by magnitude relationships of a kind 
optimal for the arousal of an A-type 
set and early following as functioning 
adjustively to sustain that set. 

But what can be said of the role of 
other visual stimuli in inducing im- 
printing? Several experimenters (Ab- 
ercrombie & Jones, 1961; James, 
1959; Smith & Hoyes, 1961) have 
recently reported that precocial birds 
will move toward a stationary flashing 
light if initial exposure takes place 
during the critical period. The im- 
portant properties of this light appear 
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to be its intensity and the size of the 
aperture through which it is displayed 
as evidenced by the fact that variations 
in one or both of these parameters 
significantly affect approach. After 
careful investigation, James (1960) 
concluded that behavior induced by a 
flashing light is analogous to following 
induced by a moving object. Now one 
could agree with James that the re- 
sponse manifested in each situation 
has similar properties. Indeed, I 
would venture further and maintain 
that these responses are not only func- 
tionally equivalent but homologous. 
Thus, whether the stimulus be a flash- 
ing light or a moving object, the neo- 
natal bird adjusts the gross input pro- 
vided by the stimulus until the re- 
sultant level of excitation comes within 
the range necessary to evoke an A- 
type organic set. Of course, in one 
case the adjustive response consists 
of following an object in motion with 
the most significant determinant of ex- 
citation being the number of retinal 
elements engaged; in the other case, it 
consists of moving toward a flashing 
light, where the important determinant 
apparently is the ratio of the number 
of such elements to the intensity of the 
source. Despite these differences, 
both responses have in common the 
same causal antecedents and the same 
innervational consequences: both are 
initiated by given stimulus magnitudes 
acting on the central nervous system 
and both serve to adjust these magni- 
tudes until an optimal level of excita- 
tion prevails. In brief, I am suggest- 
ing that it is this resolution of neural 
excitation achieved through the modu- 
lation of gross visual input, which 
constitutes the essential characteristic 
of visual imprinting. Furthermore, 
irrespective of the visual source em- 
polyed, curves describing the relation- 
ship between quantitative features of 
the proximal stimulus and measures of 
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imprinting behavior are likely to as- 
sume the shape of an inverted U. Al- 
though many parametric studies would 
of course be needed to determine the 
precise values for each stimulus class, 
I should be willing to predict that, 
when such studies are performed, we 
shall find a family of curves which will 
rise at a low input level, reach a peak 
fairly rapidly, maintain this maximum 
value over a rather narrow quantita- 
tive band, and then begin to decrease. 
Encouragingly enough, several experi- 
ments have already been carried out 
which appear consistent with this pre- 
diction. Moltz, Rosenblum, and Stett- 
ner (1960), for example, restrained 
ducklings from following a 3 inches 
X5 inches box that described an 
elliptical path at right angles to the 
animal’s line of vision and found ex- 
posure distance to be a variable of 
crucial importance in determining 
whether or not their birds followed 
when subsequently allowed to govern 
their own behavior. Specifically, of 
30 animals exposed 7 inches from the 
box, none subsequently followed; of 
24 exposed at a distance of 14 inches, 
18 subsequently followed. James 
(1959) has also reported that chicks 
will approach an intermittent light 
source of about 80 foot-candles if that 
source is displayed through a 1-inch 
aperture but not if displayed through 
a 6-inch aperture, while Smith and 
Hoyes (1961) reported that at lower 
intensities (i.e, at intensities of ap- 
proximately one foot-candle and less) 
approach and aperture size are posi- 
tively related. Such experiments cer- 
tainly indicate the significance of re- 
search designed to investigate systemat- 
ically the influence of quantitative vari- 
ations in visual input on imprinting. 
One additional point remains to be 
discussed in the present context. Sev- 
eral studies (eg., Abercrombie & 
James, 1961; James, 1960), have in- 
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dicated that a flashing light is much 
more likely to induce approach than a 
continuous light of the same intensity. 
This, of course, implies that either in- 
tensity is an unessential variable in 
imprinting or that it is essential but 
that it is not sufficient. The problem 
can perhaps be resolved if one con- 
siders the possibility that a flashing 
light is more effective than a con- 
tinuous light for the same reasons that 
a moving object is more effective than 
a stationary object. A flashing light 
and a moving object may be considered 
alike in that both constitute sources 
of intermittent stimulation and, by 
virtue of this intermittency, become 
perceptually salient at a time when the 
bird is uncommitted to an imprinting 
stimulus—that is, when it is initially 
introduced to the apparatus. 

Let us deal first with the question of 
input: a flashing light is an obvious 
source of intermittent stimulation, but 
what of a moving object? We have 
already mentioned that the early ap- 
proach movements (turning the head 
and stretching the neck toward a re- 
treating object) are usually made 
when the bird is either sitting or 
standing still. Since the avian eye pos- 
sesses a highly plicated pecten which 
casts shadows on the retina as it pro- 
jects into the vitreous humor from the 
point of entrance of the optic nerve, 
a moving object, subtending a chang- 
ing visual angle (as it does, of course, 
in relation to a stationary animal), pro- 
duces successive rises and falls in mean 
illumination of the retina—in other 
words, it produces flicker and, as 
Menner (1938; as cited by Pumphrey, 
1948) has pointed out, it does so to a 
marked degree, 

But even if one were to grant that 
a moving object and a flashing light 
have similar stimulative properties for 
the bird, the question still remains as 
to why they function more effectively 
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as imprinting stimuli than their largely 
nonintermittent counterparts. Per- 
haps the answer to this question is 
related to the differential focusing of 
visual attention during the first minute 
or two after the bird is introduced to 
the apparatus . It is reasonable to ex- 
pect intermittent input to be more 
likely to induce receptor orientation 
which then makes possible an excita- 
tion level that initiates either approach 
or retreat . Thereafter, an A-type set 
is able to function to sustain attach- 
ment or a W-type set to perpetuate 
withdrawal. This is not to say that a 
stationary object, for example, cannot 
serve as an imprinting stimulus. Gray 
(1960) and Abercrombie and James 
(1961) have shown that this is possi- 
ble, but it should do so—and in fact does 
so—only in the absence of competitive 
visual stimuli and only under condi- 
tions designed to channel visual atten- 
tion. Similarly, I think a continuous 
light will also evoke either approach 
or withdrawal depending on the gross 
input it provides, but rarely if placed 
in apposition with a flashing light. 
Since visual stimuli have been most 
often employed in imprinting, we have 
thus far confined our analysis to the 
role of moving objects and flashing 
lights. But there is no reason to as- 
sume that a flashing light, for example, 
possesses greater valence for a neo- 
natal bird than would a repetitive tone. 
Indeed, although each sensory system 
regulates neural discharge in special- 
ized ways, we would expect essentially 
the same excitational-behavioral rela- 
tionships to prevail irrespective of the 
modality involved. Thus, Collias and 
Collias (1956) have found that sev- 
eral species of ducklings evince a 
strong tendency to move in the direc- 
tion of a repetitive tone. Fabricius and 
Boyd (1953), working with many of 
the same species, reported that rhyth- 
mic repetition of a retreating sound 
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will elicit strong following irrespective 
of whether that sound is associated 
with a moving object. 

It is significant that, in testing a 
variety of acoustical stimuli, Collias 
(1952) found amplitude to be of de- 
cisive importance in influencing the 
behavior of neonatal chicks. More 
specifically, low amplitude tones 
evoked approach responses and “con- 
tentment notes,” while their high 
amplitude counterparts evoked with- 
drawal responses and “distress calls.” 
These data can be taken to indicate 
that if a repetitive tone is to function 
as an imprinting stimulus, then it, too, 
must lie within a low intensity band; 
that like a flashing light or a moving 
object it must be capable of initiating a 
level of excitation which can be ad- 
justed to fall within the range neces- 
sary to evoke A-type processes. 

Viewed in terms of their broad theo- 
retical implications, the data make it 
gratuitous to conceive of imprinting 
stimuli as comprising a special class of 
sensory events uniquely relevant to the 
expression of filial behavior. Although 
the young of precocial birds certainly 
evince attachment to the parents under 
natural conditions, the stimuli eliciting 
this attachment seem neither to derive 
from any specific avian characteristic 
nor to activate any specific neurosen- 
sory mechanism related to imprinting 
as such. 


CRITICAL PERIOD AND THE PERSIST- 
ENCE OF THE IMPRINTING 
RESPONSE 


So far these speculations have not 
touched upon the problem of the criti- 
cal period. Why indeed must initial 
exposure to the imprinting stimulus 
take place during a relatively brief 
stage after the occurrence of hatching ? 
What is there about the animal at this 
time which makes it the most pro- 
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Pitious occasion for initiating attach- 
ment to that stimulus? 

Perhaps an explanation can be 
sought in the nature of the perceptual 
environment of the neonatal vertebrate 
at the time of birth or hatching. Al- 
though the answer must be qualified in 
every case by reference to the prenatal 
attainments of the particular vertebrate 
form being considered, the environ- 
ment of the neonate appears to be 
structured largely in terms of crudely 
directionalized energy gradients to 
which adjustive movements are made 
in terms of the organic conditions pre- 
vailing at the moment. There is a 
great deal of evidence (e.g., Riesen, 
1958, 1961a, 1961b; Riesen & Aarons, 
1959; Siegel, 1953) indicating that 
patterned-light experience is required 
before this world of energy gradients 
is elaborated and refined to permit 
response to object qualities. Indeed, 
such experience appears to be impli- 
cated in vertebrate ontogeny from the 
earliest stage and seems essential in 
developing the capacity to react to ob- 
jects and to other sensory stimuli in 
terms of dimensions more subtle than 
the gross input which they offer. 

Thus the blind neonatal kitten, for 
example, approaches the mother in re- 
sponse to the gross tactual and thermal 
stimulation provided by her abdomen, 
but soon becomes capable of respond- 
ing along a much more qualitative per- 
ceptual continuum as first indicated by 
its choice of a particular nipple (Sch- 
neirla & Rosenblatt, 1961). There- 
after, the kitten’s perceptual environ- 
ment enlarges and becomes specialized 
so as to include the mother as a 
meaningful object rather than as a 
localized sensate source, 

In brief, what I am suggesting is 
that for each vertebrate form there is 
a period of development during which 
behavioral arousal is largely or exclu- 
sively determined by stimulus intensity 
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and, moreover, that this period is non- 
recurrent, terminating early in ontog- 
eny as the animal’s perceptual environ- 
ment expands to permit greater specific- 
ity of afferent control, as associative 
learning enters to modify previous be- 
havioral adjustments, and, most likely, 
as maturation of the nervous system se- 
lectively changes neural excitatory 
thresholds to facilitate more precise 
sensory-motor integration. 

Of course, whether or not we ordi- 
narily designate any ontogenetic stage 
as critical in the development of a given 
species depends on the degree to 
which its influence on behavior is ap- 
parent to us, In the case of the duck- 
ling, the influence of the early neonatal 
period is certainly apparent: the ani- 
mal suddenly approaches and soon be- 
gins to follow what typically is the 
first object seen. In the case of the 
kitten, the changes in behavior occur 
somewhat more slowly and conse- 
quently are not as striking. But for 
both the duckling and the kitten the 
period involved is critical insofar as 
it is the only period during which 
stimuli are responded to primarily in 
terms of the input level they provide. 

But if this is in fact so, then im- 
printing would seem to present a spe- 
cial problem, for the duckling continues 
to follow after the critical period has 
ended, that is, after the role initially 
exerted by input level has been pre- 
sumably rendered ineffective. What 
then functions to sustain this response 
after the termination of the period dur- 
ing which it was established? It is 
evident that the animal’s continued at- 
tachment to the object must, ex hy- 
pothesi, be a consequence of the oper- 
ation of processes different from those 
responsible for the initial evocation. 
Perhaps one of these processes in- 
volves a crude sort of contiguity con- 
ditioning. Certainly it would seem 
reasonable to expect perceptual and 
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motivational adjustments to occur 
rapidly in precocial animals and, 
through a succession of such adjust- 
ments, to expect A-type processes to 
become cued to concurrent focal stim- 
uli. In the imprinting situation spe- 
cifically, the temporal conjunction of 
the visually prepotent object and the 
organic set induced through the reso- 
lution of visual input might thereby 
result in components of that set be- 
coming conditioned to certain qualita- 
tive aspects of the object. Should this 
be the case—if at least some of the 
visceral and cardiac elements of this 
endogenously facilitated set are elicita- 
ble as conditioned responses—then, 
after the critical period has ended, at- 
tachment to the object might be main- 
tained by a process of selective learn- 
ing, with fear reduction functioning as 
the reinforcing agent. More specifi- 
cally, we can assume that A-type ele- 
ments are incompatible with fear, that 
contact with the object evokes these 
elements, and that any response in- 
strumental in effecting contact will be 
strengthened through fear reduction. 
Of course, this sequence presupposes 
that the imprinting situation arouses 
fear. As a matter of fact, there can be 
little doubt that it has just such an ef- 
fect. Several investigators have com- 
mented that, beginning at 20 or 30 
hours of age, distress calls and startle 
responses are invariably displayed 
whenever a bird is in the apparatus but 
is not in proximity to the imprinting 
object. Ramsay and Hess (1954), for 
example, noted that this was a char- 
acteristic feature of mallard behavior 
and Fabricius (1951) made the same 
observation for tufted ducks and eiders. 
Hinde, Thorpe, and Vince (1956) 
speak of “fleeing” occurring under 
these conditions and Jaynes (1957) 
finds the domestic chick responding in 
a similar manner. It is important to 
realize that the apparatus itself in- 
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duced the fear and not object separa- 
tion, as evidenced by the fact that ani- 
mals not “imprinted” during the criti- 
cal period subsequently displayed the 
same emotional behavior but avoided 
the object. Equally significant is the 
fact that fear was absent when an im- 
printed bird was either approaching 
or following; at that time even the 
sudden introduction of a loud sound 
often failed to be disturbing. 

On the assumption, then, that a se- 
lective learning process involving fear 
reduction enters to sustain the im- 
printing response, we can predict that 
the response will persist until either 
fear arousal ceases or until the “re- 
ward value” of the object is extin- 
guished. Indeed, we can proceed fur- 
ther and predict that attachment to the 
object will increase under conditions 
calculated to increase fear and will be 
weakened as a consequence of habitu- 
ation to the imprinting situation. 

One additional question arises in 
this connection: what is there about 
the imprinting apparatus which arouses 
fear? Even more precisely, why does 
the typical imprinting procedure cause 
birds older than 20 hours to react to 
the apparatus with distress calls and 
startle responses? 

Recent studies of the genesis of 
emotional behavior in several verte- 
brate (eg, Hebb, 1946; 
Jersild, 1954; McBride & Hebb, 1948; 
Melzack, 1954) have shown that strong 
fear can frequently be evoked by 
strange, although innocuous, visual 
stimuli in the absence of specific avoid- 
ance conditioning. What was found 
necessary for this development was a 
period of early sensory contact with a 
structured environment which estab- 
lished the “visually familiar”; subse- 
quently, exposure to new, “visually un- 
familiar,” situations differing markedly 
from the original evoked fear. 

Now consider some pertinent events 
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in the life history of a 20-hour-old 
bird that is a subject in the typical im- 
printing experiment. It was removed 
from the incubator shortly after hatch- 
ing and placed in a cage in which it has 
spent the first 20 hours of its life. 
Since perceptual development can be 
expected to occur rapidly in precocial 
animals, it is reasonable to assume that 
20 hours are sufficient to establish the 
cage and its surroundings as the visu- 
ally familiar environment. Although 
the bird was also exposed to the im- 
printing apparatus during these 20 
hours, the exposure was brief (con- 
sisting of one or, at most, two 25- 
minute periods) and probably provided 
little opportunity for any feature of 
the apparatus to attract perceptual re- 
gard unless it was immediately salient 
by virtue of gross input, as in the case 
of the imprinting object. I think it is 
these initially nonobserved features 
which become strange and therefore 
fearful as a result of the animal’s rela- 
tively protracted cage experience. 
Evidence has already been obtained 
(Moltz & Rosenblum, 1958) which sug- 
gests that such stimuli will continue 
to evoke fear until, through habitu- 
ation, they also attain the status of 
the visually familiar. 


EXPERIMENTAL EVIDENCE RELEVANT 
TO THE PRESENT ANALYSIS 


The formulation that has been pre- 
sented up to this point appears capable 
of integrating at least most of the cur- 
rently available data on imprinting. 
Now the question arises as to whether 
it can generate new functional rela- 
tionships—in other words, can it pre- 
dict? Certainly, a positive answer to 
this question would help to support 
the validity of our explanation. Let 
us therefore consider several different 
experiments which were suggested by 
the present analysis, 
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The first of these investigates the effect 
of decreasing retinal stimulation on the 
strength of imprinting. It will be recalled 
that the progressive reduction in retinal 
stimulation and neural energy initially af- 
forded by a retreating object was thought to 
elicit the approach movements which pre- 
cede following. Although such reduction 
was not considered important in all cases of 
imprinting—and, indeed, should not be con- 
sidered so—we did regard it as important in 
the typical test situation, adducing in evi- 
dence the observation that a duckling will 
most often respond initially to an object as 
that object moves away but not as that ob- 
ject approaches. Recently, Stephen Mindel 
and I attempted to obtain more systematic 
evidence on this point by providing different 
groups of Peking ducklings with varying 
conditions of retinal stimulation (unpub- 
lished data). For 25 minutes per day for 
two days, beginning at 10-16 hours after 
hatching, our birds were placed in a wooden 
stock designed to restrict movements of the 
head and body. While in the stock, each 
subject faced a specially constructed en- 
closure containing a pulley system which 
permitted us to present an imprinting object 
under four different conditions: as it moved 
away from the bird, moved toward it, 
alternately moved toward and away, or re- 
mained stationary. The imprinting objects 
were four green cardboard boxes identically 
constructed and measuring 3 inches X 6 
inches. When in motion, the boxes travelled 
a distance of about 5 feet in one direction. 
In the case of birds assigned to either the 
“retreat” or “approach” groups, an object 
disappeared from view at one end of the 
enclosure and was immediately replaced by 
its duplicate at the opposite end. This 
procedure was, of course, unnecessary for 
subjects assigned to the “alternation” and 
“stationary” groups. It should be empha- 
sized that, irrespective of group assignment, 
an imprinting object was always in view 
throughout the two 25-minutes exposure 
trials. 

Twenty-four hours after the termination 
of the second exposure trial, each subject 
was placed individually in our imprinting 
apparatus for a 20-minute period during 
which it was permitted to govern its own 
behavior. The apparatus consisted of an 
unpainted wooden alley, 10 feet x3 feet, 
containing a motor and a pulley system. 
One of the green boxes was made to travel 
about the alley and as it did we recorded 
the number of seconds that the subject fol- 
lowed this box as well as other aspects of 
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behavior such as avoidance responses and 
distress calls. It is important to note that 
when our animals were introduced into the 
imprinting alley, one half were placed facing 
in the direction in which the object moved 
while the remaining one half were made to 
face in the opposite direction. 


Two critical predictions can be made 
in the present situation. First, if re- 
duction in retinal stimulation con- 
tributes to the development of the fol- 
lowing response, then those subjects 
initially exposed to the retreating ob- 
ject should subsequently respond more 
strongly than the subjects of any other 
group. Secondly, if the effectiveness 
of such reduction is mediated through 
the arousal of A-type visceral and 
cardiac responses, then we should ex- 
pect these responses to become cued to 
the object and consequently expect the 
“retreat group” to exhibit fewer signs 
of emotionality than any other group 
when later tested in the presence of 
this object. 

It can be seen from Figure 1 that 
the retreat group was clearly su- 
perior to the approach and alterna- 
tion groups with respect to strength 
of following; the latter groups did not 
differ significantly from each other 
but were, in turn, superior to the sta- 
tionary group. Analysis of variance 
indicated that this pattern was signifi- 
cant below the .01 level of confidence 
(F =7.78, with 1 and 76 degrees of 
freedom). Also relevant is the fact 
that the subjects of the retreat group 
were significantly less “emotional” 
than the subjects of any other group. 
A chi square test involving the num- 
ber of avoidance responses exhibited 
during the first 5 minutes in the im- 
printing apparatus yielded a probabil- 
ity value of less than .01 (chi square 
= 11.60). 

Although the behavior of the re- 
treat group was quite in accord with 
our predictions, the failure to obtain 
a significant difference between the 
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alternation and approach groups is 
disappointing. We had expected that 
the approach group, deprived com- 
pletely of the opportunity to view 
a retreating object, would exhibit less 
following than the alternation group. 
Perhaps the difference expected in this 
case would have been found had the 
stock in which the birds were placed 
been constructed to prevent any move- 
ment of the head. As it was, the 
upper part of the stock was fitted 
around the neck so that the subject 
was free to move its head vertically 
and horizontally and, of course, free 
to open and close its eyes. Each bird 
was thus able to vary retinal input 
considerably and, in so doing, to vitiate 
the intended visual effect of its ex- 
perimental treatment. Had this ad- 
mixture of input with the consequent 
blurring of treatment distinctiveness 
not occurred, a “pure” approach con- 
dition would have been obtained which 
might have resulted in very little fol- 
lowing as compared with a corre- 
spondingly pure alternation condition. 
Since it is obviously difficult to prevent 
contamination mechanically, perhaps 
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anesthetization producing immobility 
of the head and eyes is the answer. 
A graduate student of mine is cur- 
rently investigating the possibility of 
using various anesthetics for this 
purpose. 


Our second study is concerned with the 
duration of the critical period. The point 
has already been made that attachment to 
an imprinting stimulus is most likely to be 
initiated within a relatively short time after 
hatching. We suggested that this period 
could be considered a nonrecurrent onto- 
genetic stage during which behavioral 
arousal is governed almost solely by stim- 
ulus intensity and consequently represents 
a stage during which approach and follow- 
ing function to modulate prevailing levels of 
excitation. Furthermore, we proposed that 
perceptual experience is essential in de- 
veloping the capacity to react to the quali- 
tative aspects of sensory events rather than 
merely to the gross input they provide. The 
critical period was conceived as drawing 
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to a close partially as a consequence of this 
development. 

If commerce with a structured and articu- 
lated perceptual environment is implicated 
in the circumscription of the critical period, 
then we should expect that the time during 
which it is possible to initiate attachment 
to an imprinting stimulus could be con- 
siderably extended by reducing the animal’s 
visual experience prior to its initial intro- 
duction to the imprinting situation. An ex- 
periment performed in my laboratory with 
L. Jay Stettner was designed to test this 
expectation (Moltz & Stettner, 1961). 

Peking ducklings were removed from a 
forced-air incubator within approximately 4 
hours after hatching and assigned either to 
an experimental or to a control treatment. 
Each experimental subject had placed over 
its head a latex hood which allowed the 
retinas to be stimulated by brightness dif- 
ferences but prevented the perception of 
visual forms. The control subjects were 
also fitted with a hood, but one in which 
holes had been cut to permit the retinas to 
receive normal patterned-light stimulation. 
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The experimental and control groups were 
further divided into four subgroups, hence- 
forth designated as the 12-, 24-, 48-, and 
72-hour subgroups, respectively. The sub- 
group designations indicate the mean age at 
which the first imprinting trial was ad- 
ministered. For example, a subject assigned 
to a 48-hour subgroup was kept in its cage 
under the diffuse light condition imposed by 
the experimental treatment or under the 
patterned-light condition imposed by the 
control treatment until it was approxi- 
mately 48-hours old, at which time the hood 
was removed and the subject was placed in 
the imprinting apparatus for the start of the 
first imprinting trial. Each subject was 
given two such trials, the second trial being 
administered 24 hours after the first. An 
imprinting trial consisted of a 25-minute 
period of exposure to a green cardboard test 
object that moved about our imprinting 
alley. The number of seconds that the sub- 
ject spent following the object was recorded 
and a following-score was computed by 
averaging the scores obtained during the two 
imprinting trials. 


It can be seen from Figure 2 that 
the experimental and control sub- 
groups were nearly equivalent with re- 
spect to strength of following when 
first exposure to the test object oc- 
curred at 12 hours of age but were 
markedly different when first expo- 
sure occurred at either 24 or 48 hours. 
Indeed, at an exposure age of 48 hours 
the control subgroup showed little fol- 
lowing, whereas at the same exposure 
age the experimental subgroup showed 
as much following as that exhibited by 
experimental subjects run during the 
critical period (i.e. at 12 hours). 
However, when an exposure age of 72 
hours was employed, the level of fol- 
lowing attained by the experimental 
subgroup was not significantly greater 
than the level attained by its control 
counterpart. 

Statistical analyses were in accord 
with the interpretation of Figure 2. 
It should also be noted that the same 
conclusions were reached when the fol- 
lowing score obtained by each subject 
during either the first or second im- 
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printing trial was substituted for its 
average score over both trials in com- 
puting the subgroup medians. 

A finding of particular interest in 
the present study is that those experi- 
mental subjects whose initial imprint- 
ing trial occurred at 72 hours of age 
did not follow. Although patterned- 
light deprivation resulted in vigorous 
pursuit of the test object when an ex- 
posure age of 48 hours was employed, 
an additional 24 hours under the same 
visual conditions resulted in little or no 
following. Although it is, of course, 
conceivable that patterned-light depri- 
vation for as long as 72 hours might 
have damaged the visual system of the 
bird, it is unlikely that failure to fol- 
low was due to inadequate perception 
of the test object. Observation of the 
behavior of the experimental subjects 
while they were in the imprinting ap- 
paratus, as well as ophthalmoscopic 
examination, indicated no visual deficit. 

It is more likely that the failure of 
these birds to follow is attributable to 
cumulative changes of an essentially 
nonvisual nature which occurred dur- 
ing the 72 hours that intervened be- 
tween hatching and testing. We have 
already mentioned that, in addition to 
perceptual experience, neural matura- 
tion and selective learning are prob- 
ably also involved in the circumscrip- 
tion of the critical period. In any 
event, the present results certainly 
demonstrate that- depriving ducklings 
of the opportunity to experience a 
structured or articulated visual en- 
vironment prior to their being exposed 
to the imprinting situation significantly 
extends the time during which follow- 
ing can be initiated. The fact of this 
extension not only confirms an impli- 
cation of our analysis but also suggests 
the necessity of investigating the on- 
togenetic milieu of a neonatal organism 
before ascribing critical periods in he- 
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TABLE 1 


MEDIAN FOLLOWING SCORE OF EACH ANIMAL 
DURING Days 4 To 10 


Pairs 


Group 


pe eae 


Experi | | | 
mental | 746) 223) 891| 893) 40| 57| 485) 228 
Control | 1,188 ol 1,193) 916 peos 153 4,023) 753 
| | 


havioral development to the unfolding 
of an innate growth plan. 


Finally, we conducted a series of studies 
based on our analysis of why the imprinting 
response persists after the critical period 
has terminated. It was proposed that 
A-type visceral and cardiac responses, at 
first endogenously induced, subsequently be- 
come cued to the imprinting object and 
thereafter function to sustain attachment to 
that object under fear-evoking conditions. 
Accordingly, we can predict that the 
strength with which attachment will con- 
tinue will be some positive function of the 
level of emotionality induced by the im- 
printing situation. Two experiments were 
designed in accord with this hypothesis 
(Moltz & Rosenblum, 1958; Moltz, Rosen- 
blum & Halikas, 1959). 

In the initial study, we exposed Peking 
ducklings for 25 minutes per day for 3 
days to a cardboard cube that move about 
our imprinting alley. Those birds which 
showed evidence of strong following were 
retained for further study and paired on the 
basis of the scores they obtained on Day 3. 
One member of each pair was then assigned 
to the experimental treatment and one to 
the control treatment. Beginning on Day 
4, the experimental subjects were given 
daily 1-hour habituation sessions, each ses- 
sion consisting of placing the subject indi- 
vidually in the imprinting alley in the ab- 
sence of the object. At the conclusion of 
each session the object was returned to the 
alley and the bird was permitted to follow. 
The control subjects were treated in ex- 
actly the same manner as the experimental 
subjects except that beginning on Day 4 the 
control subjects were placed in a discrimi- 
nably different situation for 1 hour each 
day. 


If habituation to the imprinting 
situation reduces emotionality, then 
there should be a decrement in the 
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strength of the following response. 
Table 1 shows the median following 
score accumulated during Days 4-10. 
It can be seen that, in the case of each 
pair, the level of following maintained 
by an experimental subject was con- 
siderably lower than that maintained 
by its control counterpart. Also rele- 
vant is the fact that our control sub- 
jects were distinctly more “emotional” 
than our experimental subjects, as in- 
dicated by distress calls and startle 
responses when the object was out of 
sight during testing. 

The second study in this series was 
designed to test an additional impli- 
cation of our hypothesis: the animal’s 
attachment to the object should be in- 
creased under conditions calculated to 
increase fear. We assumed that the 
greater the similarity between cues 
present during the application of elec- 
tric shock and those present in the im- 
printing situation, the more intense 
will be the level of emotionality in that 
situation and consequently the stronger 
the following. 

We again exposed Peking ducklings 
to a cardboard cube for 25 minutes per 
day for 10 days, Prior to the regu- 
larly scheduled exposure on Day 7, 
the object was removed from the alley 
and the subjects were assigned to one 
of four treatment groups, Each treat- 
ment consisted of confining the subject 
for 15 minutes in a glass compartment 
the floor of which was constructed of 


TABLE 2 


MEDIAN FOLLOWING SCORE or EACH GROUP 
(TRIALS 7-10) AND THE VALUES OF t 
BASED ON THE WILCOXON TEST 


Groups 
Compared U 


SI versus SO} 3* 
SI versus PI | 2* 
SO versus PO} 16 
PI versus PO! 11 


Group Median 


Shock inside alley (SI) 1,025 
Shock outside alley (SO) 758 
Placed inside alley (PI) 530 
Placed outside alley (PO) 751 


* Significant at less than the .025 level of confidence - 
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stainless steel rods for the purpose of 
delivering shock. One half of the sub- 
jects were confined in the electric 
shock compartment when it was placed 
in the alley; the remaining one half 
were confined in the compartment 
when it was outside the alley. The 
subjects were further divided into two 
additional subgroups. One subgroup 
received a series of brief but intense 
electric shocks during the period of its 
confinement; the other subgroup was 
not shocked. 

Table 2 shows the median following- 
score obtained by each experimental 
subgroup. It is apparent that the sub- 
jects shocked in the alley during con- 
finement were clearly superior with re- 
spect to strength of following during 
Days 7-10. 

Once again it is important to em- 
phasize that the protocols we had taken 
revealed that strength of following was 
related to several behavioral indices of 
emotionality. For example, animals 
shocked in the alley subsequently 
emitted a great many distress calls 
when they were not near the test ob- 
ject and would often become startled 
suddenly for no apparent reason. This 
startle behavior and the defecation 
which frequently accompanied it were 
exhibited considerably less often by 
the other birds during their exposures 
to the object. These results are 
clearly in accord wth theoretical ex- 
pectation, since they indicate that level 
of emotionality is an important vari- 
able determining the strength at which 
the imprinting response will be main- 
tained. 

The analysis that has been presented 
here has used an epigenetic approach 
as its point of departure in attempting 
to specify the manner in which intrin- 
sic and extrinsic factors participate in 
the development and organization of 
the imprinting response. Admittedly, 
much additional research, conducted 
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on a variety of avian species, is re- 
quired before the fruitfulness of this 
attempt can be definitely assessed. 
However, an epigenetic approach 
seems to possess so much empirical 
and theoretical promise for an under- 
standing of the imprinting phenome- 
non that an endeavor of this kind 
should be rewarding for both zoology 
and psychology alike. 
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A FUNDAMENTAL PROPERTY OF ALL-OR-NONE 


MODELS, BINOMIAL DISTRIBUTION OF 
RESPONSES PRIOR TO CONDITIONING, 
WITH APPLICATION TO CONCEPT 
FORMATION IN CHILDREN * 


PATRICK SUPPES axo ROSE GINSBERG 
Stanford University 


A basic consumption of the simple all-or-none conditioning model is 
that the probability of a correct response remains constant over trials 
before conditioning. 4 implications of this assumption were tested: 
(a) prior to the last error there will be no evidence of learning, (b) 
the sequence of responses prior to the last error forms a sequence of 


Bermoulli trials, (c) responses prior to the last error exhibit a binomial 


in adults, and T maze learning in 


In the past year or two there has 
been extensive application of a single 
stimulus element conditioning model 
to paired-associate learning (Bower, 
1961; Estes, 1961) and to concept 
formation in children (Suppes & Gins- 
berg, 1962). In a paired-associate 
experiment the single stimulus element 
represents a stimulus item from a list 
of paired associates; in a concept 
formation experiment the stimulus 
element represents a concept, or some 
aspect of a concept. The two essential 
assumptions of the model are the 
following. First, until the single 
stimulus element is conditioned, there 
is a constant guessing probability, $, 
that the subject responds correctly 
(the probability of an error on every 
trial is g = 1 — $). Second, on each 
trial there is a constant probability, ¢ 


1 This research was performed pursuant to a 
contract with the United States Office of 
Education, Department of Health, Education, 
and Welfare. 


statistical 
imental groups provided substantial support of the 


that the single stimulus element will 
be conditioned to the correct response. 
We consider only those situations in 
which the subject is always informed 
of the correct response so that the 
correct association may be learned on 
any trial. 

This all-or-none conditioning model 
may be viewed as resulting from im- 
posing special restrictions on more 
general models of stimulus sampling 
theory. The statistics of this model 
have been analyzed in great detail in 
Bower (1961). Supplementary sta- 
tistics for a finite number of trials at 
the end of which not all subjects are 
conditioned have been given by Estes 
(1961) and Suppes and Ginsberg 
(1962). 

The point of the present paper is to 
make explicit a simple but funda- 
mentally important fact about the all- 
or-none conditioning model: the as- 
sumption of a constant guessing 
probability on each trial before condi- 
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tioning implies that there is a binomial 
distribution, with parameter p, of 
responses prior to the last error.? This 
observation has three important con- 
sequences for the analysis of experi- 
mental data. First, it implies that the 
sequence of responses prior to the last 
error forms a sequence of Bernoulli 
trials. This null hypothesis admits at 
once the possibility of applying the 
many powerful statistics that are not 
applicable in the usual learning situa- 
tion for which the theory postulates 
dependence of responses from trial to 
trial. Second, the consideration of 
response sequences prior to the last 
error makes possible a deeper analysis 
of response data than do statistics 
which are averaged over subjects and 
are a function of the conditioning 
parameter c. When statistics are ex- 
pressed as a function of c and the data 
are analyzed in terms of all subjects 
regardless of whether or not they are 
conditioned, then it is often the case 
that the large number of correct re- 
sponses occurring after conditioning 
bias the statistics very favorably in 
terms of the model. Third, the ob- 
servation that the distribution of re- 
sponses prior to the last error should 
be binomial permits generalization of 
the model to admit individual differ- 
ences in the conditioning parameter c, 
while retaining a uniform guessing 
parameter p. 

These points may be emphasized by 
considering just one example of a 
familiar statistic for the model. Let 
P,(11) be the joint probability of a 
success on Trial n and on Trial n + 1. 
It is easily shown that 


P,,(11) 
=1-[1-£0 e) — pc] 
Bis (Bite ce ka [i A 


? It is easy to demonstrate that it is sta- 
tistically incorrect actually to include the last 
error in the analysis of response data. 
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Consider now how much simpler this 
quantity is if we know whether or not 
the subject is conditioned. Let U, 
stand for the unconditioned state on 
Trial n and C, for theconditioned state 
on that trial, etc. Then the condi- 
tional probabilities are simply* 


P,(11| Uny) = 2? [2] 
Pa(i1|UnCa41) = p [3] 
P,(11/C,) = 1 [4] 


Moreover, except for a few trials after 
the last error when the subject may be 
unconditioned but guessing correctly, 
we know what state he is in. In 
particular, on all trials prior to the 
last error we know he is in the un- 
conditioned state and thus that the 
probability of two successes in a row 
should be °. Relative to the third 
point above, it may be noted that if 
the data are summed over subjects, 
test of Equation 1 requires the 
assumption that all subjects have 
the same conditioning parameter c, 
whereas test of Equation 2 does not, 
and is compatible with the assumption 
of individual differences in condition- 
ing “propensity.” 


STATISTICAL TESTS OF THE MODEL 


Once the observation has been made 
that according to the model responses 
prior to the last error have a binomial 
distribution, it is possible to consider 
a variety of goodness of fit tests for 
this assumption. The virtue of these 
goodness of fit tests is that in contra- 
distinction to the many statistics 
considered by Bower they permit a 
genuine statistical evaluation of the 
null hypothesis that the model fits the 
data. There are four goodness of fit 


è There are only three cases to consider, 
namely, Un41, UnCn41, and Cp, because Un+t 
implies U, with probability one and Cn 
implies C,41 with probability one. 
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tests we believe to be of particular 
importance. In introducing these 
four tests, we want to emphasize that 
we are not suggesting they are the 
only tests or that they are the only 
interesting ones. It seems to us, how- 
ever, that they do ask the four most 
important questions suggested by the 
“guessing” assumption of the model. 
The statistical properties of these four 
tests are well known in the literature 
and do not need to be discussed here. 
A good reference for the first two on 
stationarity and order is Anderson 
and Goodman (1957). 

Stationarity. Perhaps the most 
striking feature predicted is that if 
data summed only over responses 
made prior to the last error are con- 
sidered, then there will be no evidence 
of learning over trials. Statistically 
this means the model predicts a 
binomial distribution of responses 
with the constant parameter p. From 
the standpoint of learning theory this 
is a particularly interesting prediction 
because of the classical emphasis on 
the mean learning curve. If the 
binomial assumption holds, the mean 
learning curve, when estimated over 
responses prior to the last error for 
each subject, will be a horizontal line. 
Empirical tests of this prediction in 
experiments concerned with children’s 
concept formation, animal learning, 
probability learning, and paired-as- 
sociate learning in human adults, are 
given below. The appropriate sta- 
tistical test for stationarity may 
formulated in terms of the null 
hypothesis that there is no change in 
the proportion of correct responses 
over trials, In order to obtain 
adequate data it is necessary to con- 
sider blocks of trials. Letting, then, 
the variable ¢ run over blocks of trials 
the appropriate x? test is as follows: 


fee nO Met 
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where ¢ = 0, 1; n:(¢) is the number of 
correct (f = 1) or incorrect (¢ = 0) 
responses in Block ¢; n (t) is the total 
number of responses in Block ¢; n is 
the number of correct (or incorrect) 
responses summed over all blocks; 
and N is the total number of responses 
summed over all blocks. The x? 
statistic has the usual limiting dis- 
tribution with T — 1 degrees of 
freedom, where 7’ is the number of 
blocks of trials. If there are m > 2 
responses, the number of degrees of 
freedom is (m — 1)(T — 1). Under 
the restriction to two responses, the 
expression for x? may be simplified to 


xo 2 [Nm (t) — mn (t) F/nnan (i) 


thus eliminating the summation over #. 
Order. The second property follow- 
ing from the guessing assumption 
which it is critical and significant to 
test is that the sequence of responses 
prior to the last error does indeed 
form a sequence of Bernoulli trials, 
that is, that there is statistical in- 
dependence in the responses made 
from trial to trial. There are various 
ways of testing this assumption but 
it seems to us that the simplest and 
most direct is to test the null hy- 
pothesis that the dependence is zero 
order versus the hypothesis that the 
dependence is first order. Acceptance 
of the null hypothesis has the strong 
implication that we cannot predict 
responses better if we know whether 
the preceding response was correct or 
incorrect. The application of this 
test to many other sets of learning 
data has led to rejection of the null 
hypothesis at extremely high levels of 
significance (often p < 1075). Many 
results of this sort are to be found in 
Suppes and Atkinson (1960). In 
terms of other experimental evidence 
this must be regarded as a sensitive 
test of the assumption of the statistical 
independence of responses. 
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The appropriate formulation of the 
x? test is as follows. 


: nij ns\? /n; 
2=Pn(—- -= =e 
x X "\ n; w) [= 


where j as well as 7 is 0 or 1; n; is the 

number of transitions from State 7 to 

State j; n; = È nij; nj = È nij; and 
i i 


N is the total number of responses, as 
before. Again, x? has the usual 
limiting distribution with (m — 1)? 
degrees of freedom, where m is the 
number of states; here, m = 2. 
Distribution of responses. Granted 
the assumption that responses prior to 
the last error are binomially dis- 
tributed, it is natural to ask if these 
responses do indeed exhibit a bino- 
mial distribution. Because the num- 
ber of responses prior to the last error 
varies from subject to subject and 
because, unless the number of subjects 
is very large, insufficient data will be 
obtained by grouping subjects accord- 
ing to the number whose last error 
occurs on the same trial, the natural 
and practical way to test the hy- 
pothesis that the distribution is bino- 
mial seems to be the following. For 
each subject consider blocks of, say 
four, trials taken up to the highest 
multiple of four equal to or less than 
the total number of responses prior 
to the last error. So that, for example, 
if the last error for a subject occurred 
on Trial 28, we would include in this 
analysis the first six blocks of four 
trials. Over the total of such blocks, 
summed for all subjects, the frequency 
of occurrence of k errors, where 
k = 0,1, 2, 3, 4, provides our obtained 
frequencies. The proportion of cor- 
rect responses over the blocks of trials 
included in this analysis is the 
maximum likelihood estimate of p. 
Using this estimate we may obtain 
from the binomial distribution the 
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predicted frequencies. On the null 
hypothesis that responses are statisti- 
cally independent a standard x? test 
for goodness of fit of the obtained and 
predicted frequencies is appropriate. 
Distribution of sequences of re- 
sponses. In addition to considering 
the distribution of responses, we may 
analyze the data in a still more refined 
way by considering the distribution of 
sequences of responses. As in the case 
of the distribution of responses, the 
practical approach is to consider 
blocks of trials of relatively small 
length and look at the sequence of 
responses within those blocks. For 
example, if we look at blocks of four 
trials, then 0111 would represent a 
sequence on which the first response 
was incorrect and the subsequent 
three responses were correct. In four 
trials there are, of course, 16 different 
possible sequences of errors and suc- 
cesses. If we consider the relative 
frequency of every possible sequence 
of responses of this length, a x? test of 
goodness of fit may then be applied 
in exactly the manner appropriate to 
the distribution of responses them- 
selves. In connection with this test, 
it is important to remark that the 
goodness of fit of the distribution of 
these sequences provides a goodness- 
of-fit test for the kind of run statistics 
much studied in the literature of 
learning theory. The difficulty with 
the usual statistics derived for runs 
is that they occur in the context of 
statistical dependence in response 
sequences and, therefore, a simple 
goodness of fit test is not valid. 
Homogeneity of individual condition- 
ing parameters. On the assumption 
that all subjects have the same condi- 
tioning parameter c, Bower (1961) 
derived the following distribution for 
the trial n’ on which the last error 
occurs (essentially this distribution 
was derived earlier by Bush & 
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Mosteller, 1959). 
Pr(n' = k) 


fork = 0 


= m 
b- p) (0-A fork>0 


where 
e 


BEET 


The test of the null hypothesis is 
simply a test of the goodness of fit 
of this predicted distribution. Be- 
cause of the relative complexity of the 
expressions for this distribution we 
have found it convenient to estimate 
c by a minimum x? method. The 
obtained minimum enables us to 
evaluate at once the goodness of fit 
of the assumption of homogeneity. 
When the number of subjects is small, 
a more sensitive test of significance 
needs to be used. 

The empirical distribution of the 
trial of the last error is a sufficient 
statistic for estimating ¢ and is the 
only statistic that needs to be con- 
sidered in which the conditioning 
parameter c enters. For sequences of 
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trials after conditioning, where the 
probability of a correct response is 
unity, statistics such as the expected 
errors before the first success, the 
variance of this random variable, the 
expected number of success runs, the 
expected number of alternations of 
errors and successes, etc., are totally 
trivial and uninteresting and are 
therefore best evaluated on the data 
prior to the last error where they do 
not depend upon ce. In itself the 
goodness of fit test for the distribu- 
tion of last errors is a test of the null 
hypothesis that subjects have a homo- 
geneous conditioning propensity. 


APPLICATIONS TO CONCEPT 
FORMATION IN CHILDREN 


Experiment on identity of sets. We 
first apply the tests considered in the 
preceding section to some unpublished 
data of our own on the formation of 
the concept of identity of sets in 48 
children of first grade age. On each 
trial the child’s task was to indicate 
whether two sets—each consisting of 


all trials 


before last 
error 
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Blocks of 4 trials 


Fic. 1. Proportion of correct responses prior to last error and mean 
learning curve (Identity of Sets experiment). 
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Successes 


FIG. 2. 


Empirical and predicted histogram of binomial distribution of correct 


responses in blocks of four trials (Identity of Sets experiment). 


one, two, or three elements—were 
identical or not. Specifically he was 
instructed to press one of two buttons 
when the stimulus pairs presented 
were ‘‘the same” and the alternative 
button when they were “not the 
same.” A total of 48 subjects were 
run through individual sessions of 56 
trials on 28 of which the stimulus dis- 
plays showed identical sets, and the 
remaining 28 nonidentical sets. In 
this experiment no stimulus display 


on any trial was repeated for an 
individual subject. 

Because young children make oc- 
casional errors that are not necessarily 
an indication of incomplete learning,‘ 
we adopted a criterion of 16 successive 
correct responses as evidence of learn- 
ing. Errors occurring after this cri- 
terion was met are ignored in the 

4 An explicit model that accounts for these 


occasional errors after conditioning is needed, 
but will not be pursued here. 
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analysis. The number 16 was chosen 
because the stimuli were randomized 
in blocks of eight trials with respect 
to identity of ordered sets, identity 
of nonordered sets, equipollence of 
nonidentical sets, and nonequipollent 
sets. 

In Figure 1 is shown the proportion 
of correct responses prior to the last 
error in blocks of four trials. For com- 
parison the mean learning curve is 
also shown in this figure. Note that 
the mean learning curve is obtained in 
the usual way by summing over all 
trials regardless of where the last 
error occurred. The x? test of sta- 
tionarity over blocks of four trials 
leads to acceptance of the null hy- 
pothesis (x? = 4.95, df = 9, p > .80), 
and confirms the hypothesis of a 
constant probability of a correct re- 
sponse over trials before conditioning. 

The test for order described above 
yields a x? of 1.78 which, with 1 df is 
not significant (p > .10). This im- 
plies that we do have statistical 
independence of successive responses. 
This result is particularly impressive 
because the total number of responses 
considered in the test is N = 937. 

To examine the hypothesis that 
responses are binomially distributed 
over trials before the final error we 
compared the empirical frequency of 
occurrence of zero, one, two, three, 
and four successes in blocks of four 
trials over all subjects, with the pre- 
dicted binomial distribution. The 
maximum likelihood estimate of p is 
simply the proportion of successes 
prior to the last error and is ĝ = .803. 
The result of the goodness of fit test 
Ge = 5.99, df = 2,p = .05) supports 
the binomial assumption. The pre- 
dicted and obtained histograms are 
shown in Figure 2. 

The goodness of fit test on the dis- 
tribution of specific sequences of 
successes and failures over blocks of 
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TABLE 1 


Frequency DISTRIBUTION OF SEQUENCES 
or Errors AND SUCCESSES OVER 
BLocks or Four TRIALS 


Response sequence | Obtained Predicted 
O = error) > frequency frequency 
0000 0 0.35 
1000 0 1.42 
0100 0 1.42 
0010 4 1.42 
0001 0 1.42 
1100 5 5.78 
1010 0 5.78 
1001 5 5.78 
0011 5 5.78 
0101 7 5.78 
0110 7 5.78 
1110 22 23.56 
1101 30 23.56 
1011 31 23.56 
0111 29 23.56 
1111 86 96.06 


Note.—Identity of Sets experiment. 


four trials, as described above, also 
yielded a nonsignificant x? value 
(2 = 11.06, df = 6, p > .05). The 
predicted and observed quantities are 
shown in Table 1. Because of the 
small entries in some rows, the follow- 
ing rows were combined to yield a net 
of 6 df: Rows 1-6, Rows 7-8, Rows 
9-11. 

The statistical results for the dis- 
tribution of the last error were as 
follows. When the c value estimated 
from mean total errors was used, the 
fit of the distribution to the ob- 
served data was very poor (ê = .061, 
x? = 26.71,df =2,— < .001). When 
c was estimated by a minimum x? 
method directly from the empirical 
data on the distribution with four 
frequency classes in the histogram, 
the results were just significant at the 
02 level (¢ = .035, x? = 6.26, df = 1). 
We return to these results later. 

Experiment on geometric forms. This 
experiment is reported in detail in 
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all trials 
before last error 


5 6 7 


Fic. 3. Proportion of correct responses prior to last error and mean learning curve 
(Quadrilateral and Pentagon concepts, Stoll experiment). 


Stoll (1962), and we present some of 
the data here with her permission. 
The subjects were 32 kindergarten 
children who were divided into two 
equal groups. For both groups the 
experiment was a successive dis- 
crimination, three-response situation, 
with one group discriminating be- 
tween triangles, quadrilaterals, and 
pentagons, and the other group dis- 
criminating between acute, right, and 
obtuse angles. For all subjects a 
typical case of each form was shown 
immediately above the appropriate 
response key. As in the previous 
experiment no single stimulus display 


was ever repeated for any one subject 
and the stimulus displays representing 
each form were randomized over 
experimental trials. The subjects 
were run to a criterion of nine correct 
responses in any one session. 

For the quadrilaterals and pen- 
tagons the guessing probabilities prior 
to the last error were essentially the 
same, f = .609 and f = .600, re- 
spectively, and the proportions of 
correct responses for the combined 
data are presented in blocks of six 
trials, together with the mean learning 
curve, in Figure 3. 

Figure 4 presents the same curves 
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Proportion of Correct Responses 


all trials 


before last 
error 


Blocks of 6 trials 


Fic. 4. Proportion of correct responses prior to last error and mean learning curve. 
(Acute, right, and obtuse angle concepts, Stoll experiment.) 


for the combined data for the three 
types of angles, although in this 
case the guessing probabilities varied 
between the angles. Both figures 
strongly support the hypothesis of a 


TABLE 2 


STATIONARITY, ORDER, AND BINOMIAL 
DISTRIBUTION RESULTS 


x? | df| 2> 

Quadrilateral, p = .609 

Stationarity (N = 273) 1.68 .70 

Order (N = 262) 0.65 40 

Binomial distribution (N = 65) 0.92 60 
Pentagon, p = .600 

Stationarity (N = 275) 2.40 60 

Order (N = 269) 1.76 AS 

Binomial distribution (NV = 65) 2.07 35 


Acute angle, p = .674 


Stationarity (N = 338) 7.96 05 

Order (N = 348) 3.17 05 

Binomial distribution (N = 85) 2.66 25 
Right angle, p = .506 

Stationarity (N = 313) 6.34 ‘ 

Order (N = 326) 2.41 10 

Binomial distribution (NV = 80) |10.52 001 
Obtuse angle, p = .721 

Stationarity (N = 268) 1.10 85 

Order (N = 256) 7.32 001 


Binomial distribution (N = 63) 2.90 
Quadrilateral and pentagon, p = 604: 

Stationarity (N = 548) 

Binomial distribution (N = 130) | 1.77 
All angles, p = .624 

Stationarity (N = 919) 0.97 


b RP NER REP NER Nee Nee 
c3 


Note, —Stoll's experiment on Geometric Forms, 


constant guessing probability prior 
to conditioning. 

For each concept with enough ob- 
servations before the last error to 
permit statistical analysis, goodness- 
of-fit tests were performed for sta- 
tionarity over blocks of six trials, 
binomial distribution over blocks of 
four trials, and order. The results of 
these tests are presented in Table 2. 
In general, there were too few observa- 
tions to permit us to perform either 
the analysis of sequence of responses, 
or the test for homogeneity of con- 
ditioning parameters, and these were 
omitted in all cases. The results 
given in Table 2 primarily support 
the constant guessing assumption 
of the all-or-none conditioning models. 
Only two of the statistics in the table 
are significant at the .01 level, the 
binomial distribution result for the 
right angles and the order test for the 
obtuse angles. The total Ns on which 
the tests are based seem sufficiently 
large not to attribute the null results 
to insufficient observations. To em- 
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Proportion of Correct Responses 


all trials 


before last 
error 
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Proportion of correct responses prior to last error and 


mean learning curve (Binary Number experiment). 


phasize this point, some additional 
results for combined concepts are 
given at the bottom of the table. 

There is one systematic tendency 
in the data that is not made evident 
by Table 2. Although the order tests 
are not significant except in the case 
of the obtuse angle, in every case the 
probability of a success following a 
success is slightly greater than the 
probability of a success following an 
error. The data are presented in 
Table 3. We shall discuss this point 
later. 

Binary number experiment. This 
experiment is reported in detail in 
Suppes and Ginsberg (1962). Five 
and 6-year-old subjects were required 
to learn the concepts of the Numbers 
4 and 5 in the binary number system, 
each concept represented by three 
different stimuli. There were two 
groups of subjects but we consider 
here only the one group of 24 subjects 
who were required to make an overt 
correction response following an in- 
correct response. Each of the six 


stimuli was displayed on 16 trials, 
randomized over the experimental 
sequence of 96 trials. On each trial 
the subject responded by placing one 
of two response cards upon the 
stimulus display. 

From test trial responses, after each 
experimental session, it seemed evi- 
dent that whereas some subjects 
learned the concepts as such, others 
learned only some of the specific 


TABLE 3 


PROBABILITY OF A SUCCESS FOLLOWING A 
SUCCESS AND OF A SUCCESS FOLLOWING 
AN ERROR FOR RESPONSES PRIOR 
TO THE LAST ERROR 


Problem of success following: 
Concept 
Success Error 
Quadrilateral 63 58 
Pentagon 63 259) 
Acute angle .70 61 
Right angle 55 46 
Obtuse angle 17 .60 


Note,—Stoll's experiment on Geometric Forms, 
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stimuli representing the concepts so 
that, in effect, there were two sub- 
groups of subjects. If the data from 
both subgroups combined is analyzed 
in terms of paired-associate learning 
of six independent items, responses 
before the final error for each item 
will represent the unconditioned state 
of that item for both kinds of subject, 
and the stationarity assumption 
should still hold. In Figure 5 are 
shown the proportion of correct re- 
sponses prior to the last error and the 
mean learning curve, both presented 
from the point of view of paired- 
associate learning. 

The data points are for individual 
trials. Because a total of only 16 
trials was run on each stimulus we 
have adopted a somewhat weaker 
criterion—six successively correct re- 
sponses—than was used in the analysis 
of the two preceding experiments. 
The proportion of correct responses 
prior to the last error are, therefore, 
only shown in Figure 5 for the first 10 
trials. The test of stationarity over 
blocks of single trials for the first 10 
trials again supports the null hy- 
pothesis (x? = 8.00, df = 9, p > .30, 
N = 844). 

The remaining goodness of fit tests 
can only be performed on the com- 
bined data from the two subgroups 
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(concept and paired-associate learning 
subjects) if the guessing probabilities 
of the two groups are the same. 
Using the criterion that the concept 
had been learned if the responses to 
the last three presentations of each of 
the stimuli representing it were cor- 
rect, we divided the data into two 
parts. The data from the group 
meeting the criterion were arranged 
for concept learning analysis (in this 
case a two-item learning situation), 
the remaining data were assumed to 
represent paired-associate learning in- 
volving six items. The proportion of 
correct responses over all trials prior 
to the last error for the two sub- 
groups, with a criterion of six succes- 
sive correct responses maintained for 
the paired-associate learning sub- 
group, was .498 for the latter, and .688 
for the concept learning subgroup. 
We were therefore not able to perform 
the goodness of fit tests upon the 
combined data. However both sub- 
groups provided, individually, suffi- 
cient observations for four of the five 
tests described—order, stationarity, 
binomial distribution, and sequences 
of successes and failures—but not for 
the distribution of the trial on which 
the last error occurs. For the paired- 
associate group over the first 10 trials 
we had 81 cases, for the concept 


TABLE 4 


Goopness-or-Fit TESTS ON RESPONSES BEFORE THE FINAL ERROR FOR CONCEPT LEARNING 
(Two ITEMS) SUBGROUP, AND PAIRED ASSOCIATE (SIX ITEM) SUBGROUP 


Two-item subgroup 


Six-item subgroup 


Goodness-of-fit tests 


x df p> 

Stationarity 8.36 9 40 

Order ay) 1 30 

Binomial -93 2 50 
Sequence of successes 

and errors 1.28 ki .90 


Ne k df p> N 

357 | 11.26 8 10 570 
427 1.06 1 .20 476 
141 3.21 1 05 277 
141 7.22 2 02 277 


Note—Binary Number experiment. x 
a Number of observations on which test is based, 


Patrick SUPPES AND ROSE GINSBERG 


TABLE 5 
FREQUENCY DISTRIBUTION OF SEQUENCES OF ERRORS AND SUCCESSES FOR CONCEPT LEARNING 


(Two ITEM) SUBGROUP, AND PAIRED-ASSOCIATE (SIX ITEM) SUBGROUP 


Two-item subgroup 


Six-item subgroup 


Sequences Obtained Predicted Sequences Obtained Predicted 
(three trial blocks) frequency frequency (two trial blocks) frequency frequency 
000 4 4.29 00 78 68.70 
001 9 9.45 10 50 69.25 
010 10 9.45 01 74 69.25 
100 11 9.45 11 75 69.80 
110 17 20.82 
011 23 20.82 
101 21 20.82 
111 46 45.90 


Note.—Binary Number experiment. 
*0 = error, 1 = success. 


formation group we had 21 cases with 
48 trials in each. The test results for 
each subgroup are reported in Table 4. 
Of the eight test results listed, seven 
are nonsignificant and one, the good- 
ness of fit test for the distribution of 
specific sequences, in the paired-asso- 
ciate learning subgroup, is a borderline 
case (.05 > p > .02). The predicted 
and obtained frequencies of specific 
sequences of successes and errors for 
both subgroups are presented in Table 
5. The probabilities of a success 
following a success and following an 
error for the paired-associate subgroup 
are .55 and .50, respectively. For the 
concept learning subgroup the same 
conditional probabilities are .68 
and .72. 


OTHER APPLICATIONS 


We remarked earlier that the all-or- 
none-conditioning model we are con- 
sidering may be placed in the more 
general framework of stimulus sam- 
pling theory. The applications to 
concept formation in children just 
discussed represent only one of many 
possible areas of applications. Al- 
though concept formation has been a 
particular focus of our own experi- 


mental research, we have examined 
data from four published experiments 
of a different sort to give some in- 
dication of the model’s possible range. 
The first two experiments were run 
with adult human subjects and are 
reported here very briefly. The last 
two are T maze studies with rats, and 
we analyze the data in more detail. 
Goodnow “two-armed bandit” experi- 
ment. The apparatus used in this 
experiment and the general procedure 
employed are described in Goodnow 
(1955) and Goodnow and Pettigrew 
(1955). We make use here of the 
response data from this experiment as 
reproduced in Sternberg (1959), who 
describes the experiment as follows: 


The apparatus was a “two-armed bandit” 
with a choice on each trial between pressing 
the left and right key. The reward schedule 
was 100:0, with the response on the left key 
(A) always rewarded, and the response on 
the right key (A) never rewarded. The sub- 
jects, 77 Harvard undergraduates, were told 
that on any particular trial one and only one 
key would pay off.. A subject in this experi- 
ment participated until he reached a criterion 
of 15 consecutive choices of the left key 
(p. 355). 


Using the criterion of 15 correct 
responses adopted by Goodnow, we 
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found that only 33 of the 77 subjects 
made their last error after Trial 7. 
We therefore terminated the analysis 
of stationarity at this point and the 
proportion of correct responses prior 
to the last error and the mean learning 
curve shown in Figure 6 are for the 
first seven trials. Although there is a 
slight tendency for the proportion of 
correct responses prior to the last error 
to increase over these seven trials, the 
result is not statistically significant 
(N = 382, x? = 9.97, df = 6, p > .10). 
In comparison the results for the mean 
learning curve itself are highly signifi- 
cant when the same stationarity test 
is applied on single trial blocks for the 
first seven trials (N = 539, x? = 34.54, 
df = 6, p < .001). According to 
Sternberg’s description, all subjects 
were rewarded on the left side. The 
data given in Figure 6 indicate that 
subjects had only overcome a guessing 
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bias for the right side after the first 
five trials. Primarily for this reason 
we have not analyzed this experiment 
in greater detail. 

Bower paired-associate experiment. 
The present experiment is described 
in Bower (1961); in addition, Bower 
has made available to us the data 
pertinent to the statistical tests used 
in this paper. The experiment is 
described by Bower as follows: 


Twenty-nine undergraduates learned a list 
of ten items to a criterion of two consecutive 
errorless cycles. The stimuli were different 
pairs of consonant letters and the responses 
were the integers 1 and 2, each response 
assigned as correct to a randomly selected 
five stimuli for each subject. A response was 
obtained from the subject on each presenta- 
tion of an item and he was informed of the 
correct answer following his responses. The 
deck of ten stimulus cards was shuffled be- 
tween trials to randomize the presentation 
order of the stimuli (p. 258). 


all trials 


before last 
error 
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Fic. 6. Proportion of correct responses prior to last error and 
mean learning curve (Goodnow experiment). 
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Proportion of correct responses prior to last error and 


mean learning curve (Bower experiment). 


The proportion of correct responses 
prior to the last error and the mean 
learning curve for the first seven trials, 
averaged over subjects and items, are 
shown in Figure 7. As is evident, at 
the end of seven trials the learning 
was nearly perfect—the proportion of 
overall correct responses being .96, 
but for those items not yet condi- 
tioned, the natural guessing prob- 
ability has remained close to .50 (the 
drop below .50 of the last point is not 
significant for it is based on only 11 
observations). As would be expected 
from the figure, the x? test of sta- 
tionarity is not significant (N = 549, 
x? = .97, df = 6, p > .95). 

The test for order also was not 
significant (N = 417, x? = 3.41, 
df = 1, p > .05), although as in the 


case of some of the experiments al- 
ready considered, the results approach 
significance. Bower gives the ob- 
tained and predicted distributions of 
last error in his article, and the agree- 
ment between the two is excellent. 
The other goodness of fit tests were 
not performed because of insufficient 
observations. 

Galanter-Bush T maze experiments. 
Four T maze experiments, together 
with the complete response data, are 
reported in Galanter and Bush (1959). 
We analyze here the initial acquisition 
data for Experiments III and IV for 
which the authors report improved 
apparatus and methodology based on 
the results of the first two experiments. 
In both experiments the animals were 
run under a noncorrection procedure 


Proportion of Correct Responses 


Proportion of Oorrect Responses 
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Exp. II (21 Ss) Exp. IV 09 Ss) 


1 2. “Sims 5. Obie oe 1 OF. ee AT AN 9 
Blocks of 2 trials Blocks of 2 trials 


Fic. 8. Proportion of correct responses prior to last error 
(Experiments III and IV, Galanter-Bush T maze). 


1 2 3 4 5 6 z 8 
Blocks of 2 trials 


Fic. 9. Proportion of correct responses prior to last error on the combined data 
of Experiments II and IV (Galanter-Bush T maze). 


so 
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and on each trial food was always 
located in the right-hand goal cup. 
For both experiments we used 10 
successively correct responses as a 
criterion of learning. The probability 
of satisfying this criterion before 
conditioning has occurred is extremely 
small and, as in the case of young 
children, occasional errors continue to 
occur even after long series of correct 
responses, it seems psychologically un- 
realistic to impose a stricter criterion. 
The proportion of correct responses 
prior to the last error over the first 
18 trials are presented in Figure 8 for 
the 21 animals in Experiment III and 
the 19 in Experiment IV. The mean 
learning curves are omitted because 
they are nearly identical for the first 
18 trials with the curves given. As 
the observed proportion of correct 
responses before the final error was 
459 for Experiment III and .447 for 
Experiment IV, we combined the data 
from both experiments and Figure 9 
shows the same curve for these data, 
again over the first 18 trials. The 
graphic evidence of stationarity for 
responses prior to the last error is 
strongly supported by the statistical 
tests of stationarity which, together 
with the results of the tests for order, 
are shown in Table 6. The sta- 


TABLE 6 


RESULTS OF STATIONARITY AND ORDER 
TESTS FOR EXPERIMENTS III AND IV 


x? | df| p> 

Experiment III 
Stationarity 19.89 |16 | .20 
Order .89 | 1 | 30 

Experiment IV 
Stationarity 7.53 |16 | .95 
Order .90 | 1} .30 

Experiments III and 1V 

(combined) 
Stationarity 20.11 |16 | .20 
Order 1.97] 1} .10 


Note.—Galanter and Bush T maze. 


TABLE 7 


FREQUENCY DISTRIBUTION OF SEQUENCES OF 
ERRORS AND SUCCESSES IN BLOCKS OF 
Four TRIALS 


Response sequence ; Baiti 
no emor O | frequency | frequency 
0000 10 8.89 
1000 14 10.51 
0100 11 10.51 
0010 9 10.51 
0001 11 10.51 
1100 13 12.45 
1010 10 12.45 
1001 12 12.45 
0011 9 12.45 
0101 15 12.45 
0110 6 12.45 
1110 13 14.73 
1101 15 14.73 
1011 20 14.73 
0111 17 14.73 
1111 17 17.45 


Note.—T maze experiments, III and IV combined. 


tionarity tests for the two experi- 
mental groups and for the combined 
data are performed for single trial 
blocks over the first 18 trials; beyond 
this point there are too few subjects 
left on each trial to permit statistical 
analysis. 

The statistical tests for binomial 
distribution of responses before the 
final error (x? = 2.24, df = 3, p > .50) 
and for specific sequences of errors 
and successes (x? = 9.35, df = 14, 
p > .80), estimated on the combined 
data of Experiments III and IV with 
202 observations, are also nonsignifi- 
cant. The frequency of specific 
sequences of successes and errors in 
blocks of four trials for the combined 
data are listed in Table 7. The 
probabilities of a success following 4 
success and following an error for 
Experiment III are .57 and .52, re 
spectively. For Experiment IV the 
same conditional probabilities are 59 
and .54. 

Of the goodness of fit test results 
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reported above for the two T maze 
experiments, none approaches signifi- 
cance so that the assumption of a 
constant guessing probability before 
the final error is strongly supported. 
As for the test for homogeneity of 
conditioning parameters, as the aver- 
age trial number on which the last 
error occurred was 21.14 for Experi- 
ment II] and 23.84 for Experiment IV, 
we were able to use the combined data 
which gave us sufficient observations 
to perform this test. We found the 
fit of the distribution of the last error 
to the observed data, with the condi- 
tioning parameter c estimated by a 
minimum xê, to be very poor (é = .03, 
x? = 29.50, df = 1, p < .001). In- 
spection of the actual distribution 
indicates that the observed variance 
is much too small in relation to the 
observed mean trial of the last error 
for the model to fit very well. 


PRELIMINARY DISCUSSION 


We initiated the extensive analyses 
of the preceding pages to investigate 
two basic assumptions of the simple 
all-or-none conditioning model: first, 
the assumption that correct responses 
before the final error are binomially 
distributed, and second, that all 
subjects have the same conditioning 
parameter. To examine the first hy- 
pothesis we suggested four goodness 
of fit tests to be applied to responses 
before the final error; these were for 
stationarity, order—a test of statis- 
tical independence of responses from 
one trial to the next—binomial dis- 
tribution of responses, and finally the 
distribution of specific sequences of 
responses over small blocks of trials. 
To test the second assumption, that 
conditioning parameters are homo- 
geneous, we proposed a goodness of 
fit test for the distribution of trials on 
which the last error occurs. 


155 


We applied the above tests to the 
data from seven experiments in vari- 
ous areas—three in children’s concept 
formation, two in adult human learn- 
ing, and two in animal learning. Six 
of the experiments were two response 
situations, one—in children’s concept 
formation—involved three responses. 
Because there were not in every case 
sufficient observations, we were not 
able to apply all the tests to the data 
from every experiment. In general 
we were able to analyze the animal 
and children’s concept learning ex- 
periments quite thoroughly, and the 
adult learning experiments rather 
more superficially. Two of the chil- 
dren’s concept formation experiments 
involved subgroups, and wherever 
possible the tests listed above were 
applied to the data of each subgroup 
separately. 

The test for stationarity was applied 
to the data from every experimental 
group and subgroup. In all we per- 
formed 16 such tests—3 of which 
were on combined data of subgroups 
which were also tested individually. 
In no case was the result of the good- 
ness of fit test significant. In so far 
as this test is concerned, the evidence 
that there is no change in the propor- 
tion of correct responses over trials— 
that the process before the final error 
is in fact stationary—appears to be 
substantial. We shall return to these 
results later. 

We were able to perform the good- 
ness of fit test for order in 11 cases, 
none of which were on data from 
combined groups. Of these test re- 
sults, 10 were not significant and 1 
was highly significant (.01>p> .001). 
The latter result was from a subgroup 
of the children’s three-response con- 
cept formation experiment (Geometric 
Forms). The foregoing results in 
general provide quite good evidence 
to support the hypothesis that re- 
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sponses are independent over trials 
before the final error. However, we 
earlier pointed out a systematic tend- 
ency in the data of at least one of the 
experiments reported which was not 
in accord with the statistical evidence 
of trial independence. In the six sub- 
groups of the Stoll experiment (see 
Table 3), the probability of a success 
following a success was in every case 
slightly greater than the probability 
of a success following an error. The 
same slight tendency is shown in the 
concept learning subgroup of the 
Binary Numbers experiment and in 
both animal learning experiments. 
On the other hand the inequality was 
reversed for the paired-associate group 
of the Binary Numbers and the 
Identity of Sets experiments. For the 
latter the probability of a success 
following a success was .81, the prob- 
ability of a success following an error 
.85. All the other probabilities re- 
ferred to are given earlier in the paper. 
We do not have available the two 
conditional probabilities for the Good- 
now Two-Armed Bandit or the Bower 
Paired-Associate experiments. 

The binomial goodness of fit tests 
were applied to 10 groups and of the 
10 test results 9 were not significant 
and 1—from a subgroup of a children’s 
concept formation experiment (Geo- 
metric Forms)—was again highly 
significant. Insofar as the test for 
distribution of specific sequences of 
responses is concerned, we were able 
to apply this test in only four cases, 
of which three gave nonsignificant 
results and one, again from a sub- 
group of one of the children’s concept 
formation experiment (Binary Num- 
bers), approached significance (.05 
> p > .02). From these tests it ap- 
pears that the assumption of a con- 
stant probability of a correct response 
over responses before the final error, 
with the consequent implication of 


independence of trials and a binomial 
distribution of responses, is not un- 
reasonable. 

As far as the test for homogeneity 
of learning parameters is concerned, 
the results are by no means un- 
equivocal. We were able to consider 
this test for only three of the experi- 
mental groups—with each of the 
groups from a different area; one 
involving animals; one adult human 
subjects ;and thethird, young children. 
Bower reports that the fit of the 
observed and predicted distribution 
of trials on which the last error 
occurred was very good for the adult 
humans in the paired-associate learn- 
ing experiment. On the other hand, 
we found the fit to be very poor 
for the animal learning experiment 
(p < .001) and for the children’s 
concept formation experiment (Iden- 
tity of Sets) it was highly significant 
when we estimated the conditioning 
parameter value from the total errors, 
(p < .001) and only just significant 
at the .02 level when the parameter 
was estimated by a minimum x” 
method. 


STATIONARITY RECONSIDERED 


The data from the seven experi- 
ments we have examined indicate that 
the simple all-or-none conditioning 
model is a good first approximation 
to the actual response behavior in a 
reasonably wide class of situations. 
Of the various properties of the model 
we have statistically examined above, 
the property of stationarity of re- 
sponse probability prior to the last 
error is the most crucial for supporting 
the basic assumption that condition- 
ing occurs on an all-or-none rather 
than incremental basis. 4 

A rather consistent phenomenon 1n 
respect to the stationary learning 


curves we present above is that of a » 


persistent, though not statistically 
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significant, tendency for the prob- 
ability of a correct response prior to 
the last error to increase over trials. 
This observation naturally gives rise 
to the question of whether the method 
of statistical analysis used may not 
have been such as to miss what is, in 
fact, a genuine change of probability 
over trials. One possibility is the 
following. The individual subject 
may actually be making a higher pro- 
portion of correct responses towards 
the end of the sequence of trials prior 
to his last error. At the same time, 
as the data indicate, individual sub- 
jects are becoming conditioned at 
different rates. When the mean 
stationarity curve is constructed by 
averaging over a fixed block of trials 
for all subjects no account is taken of 
this individual difference. The result 
may be that the subjects who meet 
criterion early—and are therefore 
making more correct responses— 
favorably weight the total proportion 
of correct responses in the early trials 
and thus appreciably support the null 
hypothesis of stationarity, in spite of 
the actual fact of individual change 
over trials before the last error. 

To avoid this possible bias there- 
fore, and to take into account in- 


Proportion of Correct Responses 


Exp. IW irat) 


Devas as ar Gi 


Fic. 10. Vincent learning curves in quar- 
tiles for proportion of correct responses prior 
to last error for five experiments. 
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dividual differences in trial numbers 
of last errors we have constructed 
Vincent-type learning curves for trials 
prior to the last error, In Figures 10 
and 11 the proportion of correct 
responses is presented over percentiles 
of trials prior to the last error, in- 
stead of the usual blocks of trials. 
Five of the experiments are shown in 
Figure 10 and 5 of the subgroups 
from the Stoll experiment are pre- 
sented in Figure 11. (The raw data 
from the Bower experiment were not 
available for this analysis.) To con- 
struct these curves the responses made 
by each subject, prior to his final 
error, were divided into quartiles. 
The first data point on each curve 
represents the proportion of correct 
responses in the first 25% of the re- 
sponses of all subjects. The second, 
third, and fourth data points similarly 
represent the further quartiles. At 
the far right of each figure is shown 
the criterion point C, where the re- 
sponse proportion is of course, one. 
As the mean percentile of each of the 
four quartiles is 12.5%, 37.5%, 62.5%, 
and 87.5%, respectively, and C repre- 
sents the 100% point, the distance 
between Points 4 and C on the 
abscissa is one half of that between 
the quartiles themselves. 

In the case of the Identity of Sets 
experiment the curve in Figure 10 is 
for the 38 subjects in the group who 
reached criterion, and in the case of 
the Binary Numbers experiment the 
curve is for those subjects who 
achieved concept mastery as defined 
in the earlier discussion of this 
experiment. 

Nonstationarity and concavity are 
the two striking characteristics of all 
five curves in Figure 10. Two of the 
five curves in Figure 11 also exhibit 
these properties, whereas the other 
three are approximately stationary. 

The actual frequencies of correct 
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Fic. 11. Vincent learning curves 
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Quartiles 


in quartiles for proportion of correct 


responses prior to last error for Stoll’s experiment. 


responses in each quartile for all ex- 
periments are shown in Table 8, 
together with the results of the x? test 
of stationarity—the test being per- 
formed with the quartiles as the newly 
defined trial blocks, so that each test 


has three degrees of freedom. The 
results exactly support the qualitative 
summary of the curves just given 
with the single exception of the 
Identity of Sets experiment, for which 
the initial guessing probability in the 


TABLE 8 
Raw DATA AND STATIONARY TESTS FOR VINCENT LEARNING CURVES PRIOR TO LAST ERROR, 


WHERE N = TOTAL OBSERVATIONS, n(t) 


= TOTAL OBSERVATIONS IN EACH BLOCK f, 


nı(t) = SUCCESSES IN BLOCK £ FOR ¢ = 1, 2, 3, 4 


Experiment N n(t) 
Identity of sets 564 141 
Binary numbers 416 104 
Goodnow 436 109 
Galanter-Bush III 396 99 
Galanter-Bush IV 408 102 
Quadrilaterals 260 65 
Pentagon 260 65 
Right angles 320 80 
Obtuse angles 252 63 
Acute angles 340 85 


mi(1) m(2) ni(3) mi(4) x 

117 110 113 126 7.66 
64 69 71 85 11.01 
49 49 59 86 34.09 
41 43 56 71 23.40 
52 51 58 73 12.39 
43 41 36 41 1.75 
40 35 41 39 1.33 
42 44 38 42 95 
45 34 48 51 12.63 
50 49 54 71 16.43 
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first quartile is already close to one, 
The Vincent curve for this latter group 
is in appearance like the rest of the 
curves in Figure 10, that is concave 
upward, The statistical significance 
of the nonstationarity test for six 
of the curves is in sharp contrast to 
the nonsignificant results presented 
earlier for the same experiments. In 
the earlier case the stationarity test 
was performed on an initial segment of 
trials and in that segment responses 
from the last quartile of subjects who 
conditioned in the early trials were 
averaged with responses from those 
who conditioned after a relatively 
large number of trials. It will be 
noted from Figures 10 and 11 that the 
nonstationary curves are relatively 
stationary for the first two or three 
quartiles, which we suggest explains 
the excellent statistical results for the 
one-element model considered earlier. 
A rough but intuitive way of putting 
it is that the one-element all-or-none 
conditioning model seems to be ac- 
counting for about two-thirds to 
three-fourths of the data. 

On the other hand it is equally 
evident that the nonstationary con- 
cave curves of Figures 10 and 11 
cannot be accounted for by the simple 
all-or-none conditioning model, which 
predicts a horizontal straight line. 
It is not immediately clear what kind 
of model will fit these curves with any 
accuracy. The problem is com- 
plicated by the fact that a percentile 
scale on the abscissa is not the kind of 
scale ordinarily used in plotting learn- 
ing curves. The remainder of this 
paper is devoted to a brief examina- 
tion of this problem in which we show 
that a certain sort of two-element 
stimulus sampling model can account 
for the obtained empirical cures. 

The two-element model we consider 
may be psychologically conceptualized 


as follows’ There are two stimulus 
elements or patterns amociated with 
each experimental situation. With 
equal probability exactly one of the 
two elements is sampled on every 
trial. Let us call the elements ¢ and r. 
When cither clement is unconditioned 
there is associated with it a guessing 
probability ge or ge, as the case may 
be, that the correct response will be 
made when that unconditioned stim- 
ulus is sampled. The assumption of 
particular importance to the present 
model—and one that is not familiar 
in the literature—is that the prob- 
ability of the sampled stimulus ele- 
ment becoming conditioned is not 
necessarily the same when both 
elements are unconditioned as it is 
when the nonsampled element is 
already conditioned. We call the first 
probability a and the second 6’ (the 
reason for the prime will become 
evident in a moment). 

Under these assumptions, together 
with appropriate general independence 
of path assumptions as given, for 
example, in Suppes and Atkinson 
(1960, p. 5), the basic learning process 
may be represented by the following 
four state Markov process, where the 
four states (¢, r), e, 7, and 0 represent 
the possible states of conditioning of 


the two stimulus elements. 
(¢,7) o = 0 
(e, 7) 0 
e 0 
z |b'/2 0 1-d'/2 0 
0 0 a/2 a/2 1-a 


Because we do not attempt experi- 
mentally to identify the stimuli ¢ and 
r, this Markov process may be col- 
lapsed into a three-state process, 
whose states are simply the number of 


$ The intuitive idea of this model originated 
in a conversation between the first author, 
Gordon Bower, and Frank Restle. 
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stimuli conditioned to the correct 
response. Here it is convenient to 
replace b'/2 by b and we obtain the 
transition matrix 


2 1 0 
2 ek 0 0 
1\/b 1-6 0 [5] 
0/0 a 1—a 


Moreover, accompanying the states 0 
and 1 we have the guessing probabili- 
ties go and gı defined in the obvious 
manner in terms of the sampling 
probability 4 and the guessing prob- 
abilities g, and g, 


go = bgo + 4g; 
gı = ige + tg, +4 = 4g0+4 


The probabilities g, and g, are not ob- 
servable, but go is, and gı is a simple 
function of it. This means that we 
have a process with three free pa- 
rameters, the conditioning parameters 
a and b, and the guessing probability 
go. 
To obtain a simple expression 
roughly corresponding to the concave 
curves of Figures 10 and 11, we give 
here the probability of a correct re- 
sponse on Trial 7 (Event Aj,;) given 
that State 2—both stimuli conditioned 
—was entered on Trial N (Event 2*y, 
j<WN). 

P(A;,;|2*y) 


ans 


ANS | E He [6] 


Lia 
where a = remy: ` It is easily shown 


that the learning curve derived from 
Equation 6 is concave upward when 
b > a, and convex upward—the 
standard result—when b < a. 

The expression for the probability 
of a correct response on Trial j given 
that the last error occurred on Trial NV 
is considerably more complicated than 
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Equation 6, but it yields the same 
concavity results for b >a, with 
convexity for b < a. On the other 
hand, it is not difficult to prove that 
at least for the simplest incremental 
model, the one-parameter linear model 
in which the increment is constant 
independent of the response made, no 
such concavity results can be obtained. 

Further detailed analysis of data 
will be required to determine the 
adequacy of the two-element model 
proposed here. It is unfortunately 
not easy to make a good estimate of 
the three parameters. Moreover, the 
detailed statistical tests reported for 
the one-element model are not valid 
for the two-element model. Evalua- 
tion of the goodness of fit of the two- 
element model is consequently a 
difficult matter, and is not pursued in 
this paper. 

If a two-element model does turn 
out to give a good account of the kind 
of experimental data analyzed in this 
paper, it may be thought of as a 
conceptual compromise between in- 
cremental and all-or-none condition- 
ing models. The conditioning is 
all-or-none for each of the two stim- 
ulus elements, but the probability of a 
correct response prior to the last error 
will be at two different levels, go and 
gı, during the sequence of trials prior 
to criterion. 


SUMMARY 


A basic assumption of the simple 
all-or-none conditioning model is that 
the probability of a correct response 
remains constant over trials before 
conditioning. In this paper we have 
examined that assumption and some 
of its implications, in some detail. 
The implications we have specifically 
tested—using data from a number of 
different experimental areas to do so— 
are, over data prior to the last error 
there will be no evidence of learning 
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over trials, responses prior to the last 
form a sequence of Bernoulli trials, 
responses prior to the last error ex- 
hibit a binomial distribution, and 
specific sequences of errors and suc- 
cesses are distributed in accordance 
with the binomial hypothesis. Four 
goodness of fit tests were used to 
evaluate the above implications and 
these are described in detail in the 
preceding pages. The four tests were 
performed on the data from seven 
experiments concerned with concept 
formation in children, paired-associate 
learning and probability learning in 
adults, and T maze learning in rats. 
The statistical evidence from these 
various experimental groups provided 
substantial support of the above pre- 
dictions. In particular, insofar as the 
prediction of stationarity over trials 
before the final error is concerned, in 
no case was the goodness of fit test 
significant. 

As the statistical method used does 
not take into account individual dif- 
ferences in the trial number of the last 
error, and may therefore have biased 
the data in favor of the null hy- 
pothesis, a more refined test of the 
stationarity process was performed by 
constructing Vincent-type learning 
curves. These curves were, for the 
most part, concave upwards and 
significantly nonstationary, although 
a few of the curves remained station- 
ary. Neither the all-or-none single- 
element conditioning model discussed 
here, nor a simple incremental model 
accounts for concave upwards Vincent 
learning curves, but a simple two- 
element model was sketched which 
does predict such results. 
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COMPARISON OF DIFFERENT POPULATIONS: 
RESISTANCE TO EXTINCTION AND TRANSFER * 
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The problem is considered of comparing resistance to extinction when 
a “correction” for systematic differences in initial extinction response 
level is required. It is shown that the most commonly used correction 
procedures have limited merit. 2 other approaches are taken up. A 
specific mathematical model may be employed and some parameter of 
the model used as a measure of resistance to extinction. The ad- 
vantages of this approach are balanced by its limited applicability. 
The shape function method, which has considerable generality, is 


presented and statistical test procedures are derived for it. 


Some 


applications are also pointed out for other areas of learning. 


Comparisons of extinction perform- 
ance frequently require that some ad- 
justment or correction be made for dif- 
ferences in initial extinction response 
level. It will be shown that the most 
frequently used methods for making 
such corrections are either incorrect or 
of limited applicability. A method of 
considerable generality which has been 
proposed by E. M. Beier will also be 
discussed. 

The nature of the problem can be 
seen in Figure 1 which shows the ex- 
tinction performance of two hypotheti- 
cal groups given different acquisition 
treatment. If mean response over any 
block of extinction trials were used as 
the dependent variable, Condition 1 
would have the lower score. This 
method of comparing resistance to ex- 
tinction, though not infrequent, does 
not necessarily answer the desired 
question since it may simply reflect the 
different acquisition levels. In order 
to avoid this objection, some way 
would have to be found to “correct” 
for the difference in response level at 
the beginning of extinction. The 
groups must, so to speak, be equalized 


1The author wishes to express appreci- 
ation to E. M. Beier for many helpful com- 
ments and criticisms. 


in respect of this latter variable. Such 
equalization must be made in some 
particular way, the particular way con- 
stitutes or adumbrates some mathe- 
matical model, and the justification for 
the “correction” is to be sought in the 
nature of the model. 

The following two examples will 
make more explicit the conditions 
under which the problem arises. Con- 
sider first an experiment on stimulus 
generalization in which all subjects re- 
ceive the same acquisition training but 
different subgroups are extinguished 
under different, generalized CSs. 
Whether or not this experiment pre- 
sents a problem depends on what is to 
be inferred. If it is only desired to 
show that the response is weaker the 
greater the difference between the ac- 
quisition and extinction CSs, any com- 
monsensible descriptive statistic should 
be appropriate. The presumptive dif- 
ference between subgroups on the first 
extinction trial is precisely what is to 
be tested for. The mean response over 
the first several extinction trials would 
probably serve as well although it con- 
founds initial extinction level and rate 
of extinction. On the other hand, if 
it were desired to show, for instance, 
that generalized CRs extinguish faster, 
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some method of correcting for the 
initial response level in extinction 
would have to be applied. 

For the second example, consider 
different acquisition conditions (e.g., 
various degrees of partial reinforce- 
ment), all followed by a common ex- 
tinction treatment. Here also the 
proper procedure depends on the ques- 
tion at hand. The practical behavior 
control problem of squeezing the most 
extinction responses out of the or- 
ganism is to be looked at in the obvi- 
ous way. However, this problem has 
marginal scientific interest. In gen- 
eral, some means must be found which 
takes into account the systematic dif- 
ferences between conditions at the be- 
ginning of extinction. Without a 
satisfactory correction procedure, the 
question of differential resistance to 
extinction will not have a satisfactory 
answer. Of course, there is no guar- 
antee that a satisfactory answer exists. 

Since extinction may be considered 
as a special case of transfer, the same 
considerations are relevant to this 
more general problem. For conveni- 
ence of exposition, only extinction will 
be considered in detail. Transfer and 
some related problems will be dis- 
cussed more briefly in a later section. 

The subsequent discussion assumes a 
situation with discrete trials and, un- 
less otherwise stated, a numerical re- 
sponse measure such as speed or am- 
plitude. Free responding situations 
may be reduced to this case by con- 
sidering an average of the response 
within successive time intervals. Cate- 
gorical response data may be similarly 
reduced by taking response frequency 
over blocks of trials. 

It will also be assumed, unless other- 
wise stated, that a method of correct- 
ing for initial extinction level is in fact 
desired. Finally, it should be noted 
that the use of the term, correction, is 
not entirely appropriate since nothing 


is being corrected. Rather, a descrip- 
tive statistic of the data is sought 
which is answerable to the question 
proposed. 


UsasBLEe METHODS 


This section considers two general 
methods of comparing resistance to 
extinction whose rationale is known. 
Since no method can be expected to be 
universally applicable, knowledge of 
the assumptions involved in a given 
method is a necessary condition for its 
use. 

Parametric methods. The methods 
of this class assume some specific 
mathematical model for the extinction 
process, and some parameter (possibly 
more than one) of the model is se- 
lected as a measure of resistance to 
extinction. The model is then fit to 
the data and a test is made for between 
conditions differences in the selected 
parameter. 

As an illustration of the parametric 
methods, it is appropriate to consider 
the simple linear operator model. 
Thus, in Figure 1, both curves are 
simple decay curves of the form 


R(n) = R(x) — [R(%) 
SROS Sia 


where R(n) is the response on ex- 
tinction trial n, R(1) is the response 
on extinction trial 1, R(%) is the 
asymptotic extinction response level, 
and @ is the extinction rate. 

Equation 1 may be rewritten in its 
difference equation form which re- 
lates the response on successive trials 
thus: 


R(n+ 1) =R(n) 
—6[R(n) —R(e)]. [2] 


Equation 2 makes more evident the 
meaning of the rate parameter. The 
magnitude of the response decrement 
produced on Trial n is R(n) — 
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Fic. 1. Hypothetical extinction curves 


for two different conditions of acquisition 
training. (Initial points on curves represent 
terminal levels in acquisition. Upper and 
lower curves represent Conditions 2 and 1, 
respectively.) 


R(n+1) which is seen to equal 
O[R(n) —R(«)]. Thus, @ is the 
proportion of the total remaining de- 
crement by which the response de- 
creases on each trial. The basic idea 
of the linear operator model goes back, 
of course, to Hull (1943). 

There are two parameters in Equa- 
tion 1 which could serve as measures 
of resistance to extinction: the rate 
constant, 0, and the asymptotic extinc- 
tion level, R(o). Since the latter 
is usually the same on the average for 
different conditions, @ would seem to 
be the proper choice. However, it 
should be noted that if R(%) differed 
from group to group, this fact would 
probably be of primary interest, and a 
comparison of 6 values would have 
secondary importance. 

In Figure 1, 0 is 0.2 for both condi- 
tions and so both are equally resistant 
to extinction, The 6 values are of 
course not directly related to the abso- 
lute decrements on each trial. This 
latter quantity is essentially the deriva- 
tive of the curve and confounds rate 
with initial level. 

Equation 2 is a linear operator model 
which has been widely used in various 
forms. For discrete or categorical re- 
sponse, R(n) is considered as response 
probability so that the response change 
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on a given trial is independent of the 
actual response on that trial. This 
model has been used in statistical 
learning theory (Estes, 1959) from 
which the @ notation is borrowed. The 
problems of estimating @ and testing 
goodness of fit have been most in- 
tensively studied by Bush and Mostel- 
ler (1955) and Bush and Sternberg 
(1959). Bush and Mosteller also treat 
a more general class of linear operator 
models for discrete response in which 
the response change depends on the 
response that actually occurs. 

For numerical response measures, 
R(n) in Equation 2 can be considered 
either as the “true” response, or as 
the actual response, i.e., true re- 
sponse plus “error.” The response 
change depends on the actual response 
in the latter but not in the former case. 
Both cases have received a rigorous 
mathematical development (Anderson, 
1961). For convenience, only the 
former case will be explicitly con- 
sidered here although it should be 
noted that the two cases yield the same 
predictions except for certain variance 
and covariance expressions, Clark 
(1959) and Weinstock (1958) have 
used Equation 1 (in a somewhat dif- 
ferent form) in analyzing their extine- 
tion data, and some information on 
statistical problems is also given by 
Lewis (1960). 

The problems of testing goodness of 
fit and of estimating @ for this model 
will not be discussed here since they 
are taken up in the cited references. 

Any other model would be treated 
in the same general way as the simple 
linear model. Any such model rests 
on relatively specific assumptions 
about the learning process. Unless 
the chosen model fits the data ade- 
quately, which does not often happen, 
the interpretation of the parameter 
estimates, and hence the assessment of 
resistance to extinction, will be some- 
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what uneasy. It thus seems desirable 
to seek a method which has greater 
generality. 

The simple linear model will, how- 
ever, be used as a reference point in 
the subsequent discussion, It would 
seem to be at least approximately cor- 
rect in a number of situations. As a 
working rule, therefore, it will be re- 
quired here that any general method 
for comparing resistance to extinction 
include this model as a special case. 

Shape function method. The basic 
idea of this method is due to E. M. 
Beier and illustrations of its use may 
be found in Beier (1958) and Logan, 
Beier, and Kincaid (1956). The 
method is nonparametric (not requir- 
ing parameter estimation) and has 
considerable generality. The present 
section gives the mathematical basis 
for the method and derives three ways 
for making the statistical analysis. 
Limitations of the method will be taken 
up in a later section. 

The model for the method can be 
written in the following form: 


R(n) = R(#) — [R(») 
=- R(1)]f(»), [3] 


where R(1), R(n), and R(%) have 
the same meaning as in the discussion 
following Equation 1. The function, 
f(n), is completely arbitrary, subject 
only to the restrictions that f( 1) 
= Í, and f(%) =0. These boundary 
conditions serve to reduce the f(n) 
to a standard form which is defined as 
the shape of the curve. 

All quantities in Equation 3 are as- 
sumed to vary with subjects so that 
in particular, each subject has a sepa- 
rate shape function, f(n). The indi- 
vidual R(n) and f(n) will denote 
either the true or observed values 
according to the needs of the moment 
which will be clear from the context. 
An explicit distinction, which is needed 
for a completely rigorous treatment, is 
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rather cumbersome and is not neces- 
sary for present purposes, 

Equations 1 and 3 are similar in 
form. In fact, Equation 1 is a special 
case of Equation 3 as may be seen by 
setting f(n) = (1 — 8)". Equation 
3 is more general since it includes, for 
instance, the linear model in which 6 
itself changes from trial to trial. The 
value of the shape function method is 
that it includes a large class of spe- 
cific learning models but does not re- 
quire any further assumption about 
which particular one is appropriate to 
the data at hand, 

In application of Equation 3, the 
null hypothesis to be tested is that the 
shape functions, f(n), are the same on 
the average over the various experi- 
mental conditions. If a simple decay 
curve does happen to fit the data, then 
this null hypothesis is identical to that 
of the simple linear model, namely, 
that the average 6 values are the same 
for the various conditions. However, 
it will not always be possible to in- 
terpret f(n) simply in terms of a learn- 
ing rate. 

Statistical analysis for the shape 
function method. There are three 
main ways in which to make the sta- 
tistical analysis for the shape function 
method. These analyses assume that 
all subjects receive N (or more) ex- 
tinction trials where N is some fixed 
number. 

The first and most straightforward 
statistical test is got by rewriting 
Equation 3 in the following form: 


f(n) = [R(%) — R(n)] 
/[R(%) —R(1)]. [4] 


If estimates of R(1) and R(%) are 
available for each subject, substitution 
of these values together with the ob- 
served R(n) into Equation 4 yields a 
row of numbers, 
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which constitutes an estimate of the 
shape function for that subject. Ap- 
plication of this procedure for each 
subject yields a 2-way, Subjects X 
Trials, table of numbers for each ex- 
perimental condition. These tables are 
the raw material for the statistical 
analyses which may be performed in 
any of the usual ways. For instance, 
in a Conditions X Trials, repeated 
measurements analysis (see Green- 
house & Geisser, 1959; Lindquist, 
1953), significance of Conditions or 
the Conditions X Trials interaction 
would reject the null hypothesis that 
the mean shape function is identical in 
all Conditions. 

It is not necessary to use all the 
trials in the statistical analysis. If N 
is large, inclusion of the later trials on 
which the shape functions will be 
nearly equal may serve no good pur- 
pose. In addition, Trials should not 
be included in the analysis unless the 
test of Conditions X Trials answers 
some useful question. If it does not, 
the mean (or sum) of the f(n) for 
each subject may be used as the single 
dependent variable, the test of this 
quantity being equivalent to the test 
of Conditions in the full Conditions 
x Trials analysis. The mean of the 
f(n) for a given subject over the first, 
say, N’ trials is obtained most simply 
by making use of the following equa- 
tion, got by summing Equation 4 over 
the first N’ trials and dividing by N’. 


(1/N’) Sf(1) 
= [R(%) — (1/N)3R(n)] 
/[IR(%) -= R(1)]. [5] 


The right side of Equation 5, which 
is easily calculated from the data, is 
then the score for analysis. 

The individual estimates of R(1) 
and R(%) should be reasonably reli- 
able and unbiased. Biased estimates, 
which would result in particular from 
using group means for R(1) and 
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R(), would produce a bias in the 
shape function estimates which may 
not be serious but has not been in- 
vestigated. Unreliability in the es- 
timates of R(1) and R() will also 
bias the shape function estimates since 
these quantities appear in the denomi- 
nator of Equation 4. Such bias does 
not affect the test of the null hypothe- 
sis if its magnitude is the same in all 
conditions. In practice, it would seem 
that this should usually be approxi- 
mately true. Accordingly, unreliabil- 
ity in R(1) and R(%) would be ex- 
pected to have little effect on the va- 
lidity of the test. Of course, such un- 
reliability could considerably decrease 
the power of the test. 

It thus appears that the above test 
is most appropriate when each sub- 
ject supplies a reasonable amount of 
asymptotic acquisition data—which 
would in many situations furnish a 
direct estimate of R(1)—and a rea- 
sonable amount of asymptotic extinc- 
tion data—which furnishes a direct 
estimate of R(o). Unfortunately, 
the former requirement may be incom- 
patible with the purposes of the ex- 
periment, and the latter requirement 
may necessitate a considerable num- 
ber of extinction trials which are not 
otherwise useful. It may, of course, 
be possible to extrapolate the pre- 
asymptotic data in both cases. 

In some situations, as in certain eye- 
lid or avoidance conditioning experi- 
ments, R( oo) might be known a priori 
to be zero. There would then be no 
need for a large number of extinction 
trials in order to get a good estimate 
of R(«). In these cases, it is seen 
from Equation 5 that the trial mean of 
f(n) is equal to the mean number of 
extinction CR’s divided by the initial 
extinction level, R(1). 3 

The idea behind the second statisti- 
cal test may be stated as follows. Sup- 
pose that two subjects have the same 
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shape function. Consider the trial on 
which each subject's curve has pro- 
gressed a given proportion of the dis- 
tance from R(1) to R(N). This trial 
is the same for each subject. 

The derivation of the test procedure 
assumes that trial number is a continu- 
ous variable, and that R(n») is a con- 
tinuous function of trial number. It 
also assumes that the shape function 
is monotonic. The continuity assump- 
tion is largely a mathematical nicety 
except possibly when extinction is very 
rapid. If the monotonicity assumption 
were not satisfied (as in Grant, Rio- 
pelle & Hake, 1950; Weinstock, 1958), 
some modification of the test procedure 
would be required. 

Let k be any number between 0 and 
1 and let nx be the trial such that 


R(n) = R(1) 
— k[R(1) —R(N)). [6] 


In other words, m, is the trial on 
which the response has changed by the 
proportion k of the total change over 
the first N trials. For k = 1, for in- 
stance, nę would be the trial on which 
the response is midway between its 
initial and final values. Equation 6 
has a solution because of the con- 
tinuity assumptions. 

Substituting the values of R(1), 
R(n), and R(N) as given by Equa- 
tion 3, Equation 6 becomes after sim- 
plification, 


f(m) = kf(N) + O- k)i). [7] 


Because f(n) was assumed to be a 
monotonic function, it has a single- 
valued inverse function, denoted by f*, 
which has the property that f> [f (m) | 
=m, Applying f* to both sides of 
Equation 7 thus yields, 


me =f [kiCN) + (1 — k) f(D]. [8] 


From Equation 8 it is seen that 7w 
depends only on the chosen k and on 
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the shape function. If two subjects 
have the same shape function, they 
have also the same values of ne. 
Hence also, if two conditions have the 
same distribution of shape functions, 
they have the same average m. Equa- 
tion 8 thus justifies the use of m as 
the dependent variable in the statisti- 
cal analysis. 

The statistical analysis itself is based 
on Equation 6 and in its simplest form 
would proceed as follows. The values 
of R(1) and R(N) are estimated for 
each subject. These estimates, to- 
gether with the chosen value of $, are 
substituted into the right side of Equa- 
tion 6 to yield an estimate of R(m). 
The observed response which is closest 
to this estimate of R (nx) is now found. 
The trial number on which this ob- 
served response occurs is an estimate 
of ny and this trial number is used as 
the dependent variable. 

This test procedure has the con- 
siderable advantage of not requiring 
an estimate of the asymptotic extinc- 
tion level, R(%). Consequently, any 
number of extinction trials may be 
run. 

The power of the test may be in- 
creased in the following ways. The 
estimate of R(N) will be more reliable 
if it is taken as the mean response over 
a terminal block of trials. This is al- 
ways legitimate in the present test 
procedure, although it may defeat its 
purpose if the response is changing 
rapidly over the trials in question. 
For this reason, basing the estimate of 
R(1) on an initial block of extinction 
trials, although also legitimate, is 
probably not desirable in general. The 
power may be further increased by 
using several values of k, such as k 
=, %, %, yielding several scores 
for each subject. The mean of these 
scores will be more reliable than any 
one and may be used directly as the 
dependent variable. Alternatively, the 
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several scores may be analyzed in a 
repeated measurements design. 

The third statistical test will be men- 
tioned briefly since it has been dis- 
cussed in greater detail in another con- 
nection (Anderson, 1962). Suppose 
in general that each subject has a re- 
sponse curve, R(n), which may be 
written in the form 


R(n) =a + bh(n). [9] 


Equation 9 may be considered as the 
shape function method in its general 
form. It includes Equation 3 as a 
special case as may be seen by setting 
a=R(«), b=R(1) —R(~), and 
f(n) =h(n). 

The null hypothesis to be tested is 
that the function A(n) is the same in 
all conditions. If this null hypothesis 
is true, then Equation 9 implies that 
the response curves for all conditions 
will be linearly related to one another. 
The data for the test are the raw 
scores. If each subject has N scores, 
then the N x N between conditions 
sums of products matrix has rank 2 
under the null hypothesis. The sta- 
tistical analysis is made by applying 
Fisher’s (1938) test of coplanarity. 
Numerical examples may be found in 
Rao (1952, pp. 364-375). 

Strictly speaking, the test of the 
previous paragraph assumes that A(n) 
is the same for all subjects within a 
given condition. If this assumption 
is not met, then Equation 9 introduces 
a bias in the mean response curves of 
magnitude, Covariance [b, h(n)]. 
This bias tends to cancel in the signifi- 
cance test and will, moreover, be small 
if the correlation between b and h(n) 
is small, Although there is thus some 
reason to believe that the bias will not 
seriously affect the test, it is uncertain 
whether it should be used for com- 
paring resistance to extinction since 
extinction data often show extreme in- 
dividual differences. However, the 
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method may be more useful in other 
types of transfer. 

This test procedure has the advan- 
tage that neither a nor b has to be 
estimated. As a consequence, it could 
be used to test similarity of curve 
shape in situations in which the first 
two test procedures might not be ap- 
plicable. It may be noted, in this con- 
nection, that a direct analog of the 
statistical test procedure developed in 
Equations 6-8 may also be derived 
from Equation 9. That test pro- 
cedure could thus be used in applica- 
tions of Equation 9 as long as the 
response curves were monotonic. 

It may be worth mentioning that 
application of orthogonal polynomial 
technique may reduce the computa- 
tional labor of the above mentioned test 
for coplanarity which is rather ardu- 
ous when N is large. Suppose that a 
fourth degree polynomial was ade- 
quate to give a reasonably good fit 
to each subject’s extinction curve. 
Each subject would then be given five 
derived scores as follows. The first 
derived score would simply be the 
sum of the N raw scores for that sub- 
ject. The second derived score would 
be obtained by multiplying each raw 
score by the corresponding coefficient 
of the linear orthogonal polynomial 
and adding the N products. (This de- 
rived score may be considered as a 
“slope” score for the given subject 
since it is proportional to the slope of 
the best fit straight line to that sub- 
ject’s data.) The third, fourth, and 
fifth derived scores would be obtained 
similarly by applying the coefficients 
for the quadratic, cubic, and quartic 
orthogonal polynomials, respectively, 
in the same way. These five derived 
scores would then be used to form a 
5 X 5 between conditions sum of prod- 
ucts matrix. This matrix would have 
rank 2 under the null hypothesis and 
its analysis would proceed as noted 
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above. The total labor, although still 
respectable, would be less than that re- 
quired for the analysis of the N x N 
matrix and it would also be expected 
that the statistical test would be more 
powerful. 

It may be noted that it would not be 
correct to apply the orthogonal poly- 
nomial trend analysis for repeated 
measures directly to the obtained data. 
There is a real difference between, for 
instance, the quadratic components of 
the two curves of Figure 1 even though 
they each have the same extinction 
rate. 


OTHER METHODS 


This section takes up some other 
methods which have not infrequently 
been used to measure resistance to €x- 
tinction but which are less satisfactory 
than those of the previous section. 

Visual inspection. The method of 
visual inspection may be illustrated by 
the two extinction curves of the upper 
panel of Figure 2. It is so obvious 
that Condition 2 is the more resistant 
to extinction that a formal statistical 
test seems tediously unnecessary. In 
actual fact, these two curves are based 
on single rats, by no means atypical, 
from different experimental condi- 
tions whose mean extinction curves, 
shown in the lower panel of Figure 2 
are not significantly different (Naka- 
mura & Anderson, 1962). 

Visual inspection is of course widely 
used and widely useful, both as a sub- 
stitute for and as an adjunct to a for- 
mal statistical test. However, the con- 
siderations which govern its use are 
complex and not within the scope of 
the present article. It may be noted, 
however, that the subjective assess- 
ment of the reliability of the data is 
considerably easier for the original in- 
vestigator than for the reader of the 
published work who usually has little 
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Fic. 2. Extinction curves for two groups 
of Long-Evans rats given 10 days acquisi- 
tion training at 1 or 2 minute intertrial 
intervals. (See text.) 


personal appreciation of the prevailing 
variability. 

It should also be noted that admit- 
ting visual inspection as a legitimate 
method of comparing resistance to ex- 
tinction does not contradict the earlier 
assertion that all such methods pre- 
suppose some mathematical model. 
The subjective judgment rests on the 
belief, inarticulate though it may be, 
that no conceivable member of some 
more or less ill-defined class of models 
would lead to a different conclusion. 

No-correction method. In this 
method, a statistical analysis is made 
of some dependent variable based on 
the extinction data without making 
any allowance for initial differences in 
extinction. As noted in the introduc- 
tion, such dependent variables are ap- 
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propriate for certain questions. If they 
are not appropriate, then this method 
is poorer than visual inspection since 
it puts a statistical finish on the an- 
swer to the wrong question. 

However, the no-correction method 
may sometimes justifiably be employed 
in conjunction with visual inspection, 
The differences in initial level are 
sometimes quite small compared to the 
differences later in extinction. It 
might then be evident that application 
of the shape function method would 
have little relative effect on the ex- 
tinction scores in which case its use 
would be uneconomical. A similar 
situation arises when the initially 
higher of two extinction curves crosses 
over the other and lies at a lower level 
later in extinction, e.g., Humphreys 
(1939). A direct test on the differ- 
ence in response over the later trials 
would then be legitimate (assuming 
equal asymptotic extinction levels) 
since correction for the initial level 
would only increase this difference. It 
will be realized, of course, that selec- 
tion of the dependent variable after 
inspection of the data, although not 
necessarily to be condemned, does run 
the risk of capitalizing on chance 
fluctuations. 

Covariance. The popularity of this 
technique apparently arises from the 
impression that covariance “makes the 
groups equal” or “adjusts for between 
groups differences” on the covariate. 
It is true that when random assign- 
ment has been employed, covariance 
can be a powerful tool in adjusting 
(chance) sampling error and reducing 
error variability. However, if the be- 
tween groups differences on the co- 
variate are systematic rather than 
chance, one may well wonder what ex- 
actly it means to ask what the data 
would be like if they weren’t what they 
are. A more detailed discussion of 
these points is given by Smith (1957). 
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For present purposes, it is evident 
that any adjustment must be based on 
some substantive model. Accordingly, 
it is necessary to see whether the sta- 
tistical model underlying covariance is 
psychologically appropriate to the 
problem at hand. 

The question may be answered by 
considering the particular case in 
which the extinction curves are simple 
decay curves. Since this is probably a 
reasonable assumption for at least 
some situations, covariance should be 
required to give a correct result. The 
earlier discussion provides a correct 
method for measuring resistance to 
extinction and it is to this that the 
results of a covariance analysis are to 
be compared. 

The following assumptions are 
made: Equation 1 holds; N extinction 
trials are run where N is so large that 
(1 — 6)* is negligible; the dependent 
variable is the mean response over the 
N extinction trials; and an unbiased 
estimate of R(1) is available for use 
as a covariate. 

By averaging Equation 1 over the N 
trials for a given subject we obtain 


R(n) = [1 — 1/N0]R( œ) 

+ (1/N@)R(1), [10] 
where R(n) denotes the mean re- 
sponse over the N trials for the given 
subject. 

Suppose for the moment that, within 
each experimental condition, all sub- 
jects have the same 6 value, and that 
R(1) and R(%) are uncorrelated. 
Equation 10 is then the regression 
equation of the dependent variable, 
R(n), on the covariate, R(1). This 
regression equation has a regression 
coefficient of 1/N@. If the various 
conditions have different rates of ex- 
tinction, their regression lines will be 
nonparallel. The test for parallelism 
of regression lines thus yields a proper 
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test of differential resistance to extinc- 
tion in the special case being con- 
sidered. This conclusion is at variance 
with the usual application of covariance 
in comparing resistance to extinction 
which assumes parallel regression lines 
and tests instead for differences in 
their elevation. It can be shown that 
this latter test would give an incorrect 
answer in the example of Figure 1. 

If the pro tem assumptions of the 
previous paragraph are removed, the 
problem becomes quite difficult to 
handle because the variances and co- 
variances of R(1), R(), and 6 all 
affect the slopes of the regression lines. 
It does not seem worth while to pursue 
the analysis, not simply because of its 
difficulty, but also because it would 
have to be redone for each new de- 
pendent variable and for each new 
specific model considered. Accord- 
ingly, although it should not be con- 
cluded that the covariance model is in- 
correct, it is clear that it should not be 
used without explicit justification. 

Factorial designs. Many experi- 
ments use acquisition treatments and 
extinction treatments as factors in a 
2-way design, the dependent variable 
being some measure of resistance to 
extinction, Frequently, no correction 
for initial extinction differences is de- 
sired (eg., Grant & Schneider, 
1948). In that case, a direct analysis 
of the dependent variable is appropri- 
ate since the counterbalancing pro- 
vided by the design has nothing to do 
with correcting for initial level. Al- 
though it may not be necessary to 
buttress this last statement, a brief il- 
lustration may be useful in other ways. 
The discussion will be restricted to the 
results of a factorial design in a simple 
case and constitutes, to a certain ex- 
tent, a digression from the main pur- 
pose of the article. 

The example considered in the above 
discussion of covariance will be taken 
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up here also, Accordingly, it is as- 
sumed that: Equation 1 holds; N ex- 
tinction trials are run where N is large 
enough that (1 — 6)* is negligible; the 
dependent variable is the mean re- 
sponse over the N extinction trials; 
and also that R(%)=0. Under 
these assumptions, the theoretical ex- 
pression for the dependent variable is 
R(1)/N@ as may be scen by setting 
R(«) =0 in Equation 10. 

Consider now a 2 X 2 design with 2 
acquisition and 2 extinction treat- 
ments, Suppose that initial extinction 
level depends only on acquisition treat- 
ment and let R,(1) and R,(1) denote 
these two quantities. Suppose also 
that extinction rate depends only on 
extinction treatment and let 6, and 6, 
denote these two rates. The data will 
then conform to the following 2 x 2 
table: 


R,(1)/N0, R,(1)/N6, 
R,(1)/N@, R,(1)/N@, 


where rows and columns represent ac- 
quisition and extinction treatments, 
respectively. Since the row sums are 
unequal and the column sums are un- 
equal, the statistical analysis would be 
expected to show effects of acquisition 
and of extinction treatment. If the 
purpose of the experiment required a 
correction for initial extinction level, 
one of the methods of the preceding 
section would be used. The row effect 
would then disappear as it should 
since in this example the acquisition 
treatment does not affect extinction 
rate, 

The row and column effects simply 
represent the assumptions made. 
However, there is also an interaction 
since the difference between the entries 
in the first row does not equal the dif- 
ference between the entries in the sec- 
ond row. This may perhaps seem 
somewhat surprising since there is no 
psychological interaction of the ac- 
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quisition and extinction treatments. 
Three aspects of the interaction require 
comment. 

First, as in this example, significant 
interaction may simply be a nuisance, 
with no psychological importance. 
Second, significant interaction does not 
necessarily prejudice the interpretation 
of the main effects despite statements 
to this effect in many statistics texts. 
Here the relative effect of acquisition 
treatment is in the same direction for 
each extinction treatment and vice 
versa, Third, as is well known, the 
measuring scale of the dependent vari- 
able may have considerable effect on 
the size of the interaction. 

In the present sample, it is possible 
to make the interaction zero by apply- 
ing a log transformation since log 
R(1)/N@= log R(1) — log N@. The 
transformation would tend to increase 
the power of the statistical test by con- 
centrating the observed differences in 
the main effects where they belong. 
In tasks similar to that being con- 
sidered, the raw dependent variable is 
often positively skewed so that the log 
transform would also be expected to 
increase normality. Accordingly, it 
may be desirable to use this transfor- 
mation in such cases. However, if the 
effects of acquisition treatment were 
slight, a reciprocal transformation 
might accomplish the same ends and 
also yield a more natural scale of meas- 
urement. 

Miscellaneous. Other attempts to 
correct for differences in initial level 
have employed difference scores and 
standardized scores based on final ac- 
quisition level. The validity of these 
and other possible correction pro- 
cedures can usually be checked by 
testing whether they work in special 
cases as was done for the covariance 
procedure. To the author’s knowl- 
edge, there are no justifiable methods 
except as noted in the previous section. 
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Since the test against a special case is 
usually fairly easy, the miscellaneous 
correction procedures will not be fur- 
ther discussed. 

Experimental correction procedures. 
Since it would appear that no mathe- 
matical correction method will be per- 
fectly satisfactory, one might wonder 
whether the desired end could be 
reached by experimental means. Two 
possibilities will be discussed briefly. 

First, subjects from different groups 
could be matched on the basis of their 
terminal acquisition performance. If 
such a matching procedure were ap- 
plied to the hypothetical data of Fig- 
ure 1, the higher performers of Con- 
dition 1 would tend to be matched with 
the lower performers of Condition 2, 
and some subjects would be eliminated 
from each condition. This subject 
elimination could yield biased samples, 
and in fact the procedure would only 
make sense if the terminal acquistion 
level were uncorrelated with the 
proper measure of resistance to extinc- 
tion. Since this seems unlikely on the 
face of it, since it probably could not be 
tested without a mathematical correc- 
tion method, and since in any case the 
matching will be imperfect because of 
a regression artifact resulting from un- 
reliability in the matching variable, 
this procedure has little to recommend 
It. 

A second approach would be to 
terminate acquisition training at some 
chosen performance criterion in an at- 
tempt to equate subjects on terminal 
acquisition response. Although this 
criterion method has a certain attrac- 
tiveness, several limitations and cau- 
tions must be kept in mind in any 
attempt to use the method. 

In the first place, the criterion would 
have to be such that all subjects would 
normally attain it, This would be 
sometimes infeasible, and often experi- 
mentally undesirable. In the secon 
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place, the criterion method does not 
completely equate subjects in terms of 
performance. As is well known (eg. 
Hilgard, 1938; Underwood, 1954), if 
a postcriterial trial were run, then the 
slower learners would generally be in- 
ferior to the faster learners, due both 
to learning on the criterial trials, and to 
a regression artifact resulting from the 
use of a criterion. In general, there- 
fore, a direct comparison of extinction 
performance would not be justified. 
In fact, it would usually be necessary 
to include some fixed number of post- 
criterial acquisition trials large enough 
to allow a reliable assessment of how 
well the criterion method had suc- 
ceeded. And then, if the final acquisi- 
tion differences between conditions 
were not negligible, one of the pre- 
viously discussed correction methods 
would have to be applied. In the 
third place, it will be realized that the 
criterion method may complicate the 
interpretation of the results since the 
attempt to equate performance would 
often result in systematic differences 
between conditions in terms of num- 
bers of reinforced and unreinforced 
trials. 

These limitations of the criterion 
method do not mean that it is not a 
useful step in the right direction, at 
least in certain experimental situ- 
ations. However, it is evident that 
the method must be used carefully. 

The above experimental correction 
methods are only two immediately ob- 
vious possibilities, and there may be 
others which are more effective. How- 
ever, in view of the nature of the 
problem, it does not seem likely that 
there is any generally applicable ex- 
perimental method which would not 
require some assumptions. 


RELATED APPLICATIONS 
The preceding discussion has been 


explicitly concerned with resistance 


to extinction, This section takes up 
some related areas of application of 
the shape function method. It is still 
assumed, of course, that a correction 
for different initial and/or final re- 
sponse levels is appropriate to the 
problem at hand. 

In fact, the shape function method 
makes no assumptions which are pecul- 
iar to the extinction situation. The 
method may be used equally well in the 
comparison of any learning curves, re- 
gardiess of their origin. The develop- 
ment given above applies without 
change to such problems as the com- 
parison of resistance to transfer, and to 
the comparison of acquisition curves 
obtained under different experimental 
conditions, 

More generally, it is to be noted 
that the shape function method is not 
restricted to learning data but may be 
applied to the comparison of curves of 
any sort. Thus, for example, the 
McCrary-Hunter (1953) hypothesis 
that various bowed serial learning 
curves have the same shape could be 
tested by the procedures mentioned in 
connection with Equation 9. General- 
ization gradients might also be amena- 
ble to treatment with the method. 

It should be realized, of course, that 
application of the shape function 
method is subject to the limitations 
discussed in a subsequent section. In 
particular, there may be other aspects 
of the data which have more interest 
than curve shape. In transfer studies, 
for instance, it occasionally happens 
that different acquisition conditions 
asymptote at different levels in a com- 
mon transfer condition (Anderson, 
1960; Kimble, Mann, & Dufort, 1955). 
The asymptotic transfer level may 
then be taken as a measure of re- 
sistance to transfer and presumably 
would be the main object of interest 
even though it might still be useful to 
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compare the shape functions of the 
transfer curves. 

The problem of comparing retention 
would require treatment in its own 
right. However, since it is not unre- 
lated to the present discussion a few 
remarks on this problem may be use- 
ful. In the first place, if retention 
were measured at a number of time 
points, the resulting curves could be 
compared using the shape function 
method assuming, of course, that the 
definition of shape employed in this 
method was considered appropriate. 

More often, however, retention is 
measured at only one time point and 
the situation is then more complex. 
In this case, the shape function method 
reduces to 


R(2) = R(%) 
— [R(o) — R(1)]f(2), 


where R(1) is the final acquisition 
level, R(2) is the observed retention, 
R(«) is the long-term retention, and 
f(2) is the value of the shape function 
at the time that retention is measured. 
It is desired to compare the values 
of f(2) for different experimental con- 
ditions and this could be done if R(1) 
and R(%) were known. Unfortu- 
nately, this will seldom be the case. 
If a criterion were used in acquisition, 
as is often done in verbal learning 
studies, then the estimates of R(1) 
may be biased. And, although it is 
probably true that retention will be 
zero after infinite time, the appropri- 
ateness of this long-term view seems 
dubious. It would appear, therefore, 
that the above equation will in general 
have two or three unknowns and so 
be insoluble from the data. Of course, 
just as with resistance to extinction, 
there may be cases in which visual 
inspection is sufficient. In general, 
however, it would seem that a differ- 
ent approach is to be sought. 
Underwood’s (1954) method of 
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successive probability analysis may of- 
fer some hope in this direction. Basi- 
cally, this method attempts to obtain 
a curve for associative strength for 
different items at the end of acquisi- 
tion. Such curves are obtained for 
different conditions and retention is 
compared across conditions for items 
which have the same associative 
strength in acquisition. Underwood's 
method must, of course, rest on some 
model but the assumptions of that 
model have not been elucidated. More- 
over, the use of a criterion in acquisi- 
tion would be suspected to make the 
application of the model difficult even 
if it were known to be correct in 
principle. 

Nevertheless, the basic idea of Un- 
derwood’s method is ingenious and 
further work in that direction may 
pay off. It may be suggested that it 
would be desirable to have the associ- 
ative strength at the end of acquisition 
under better experimental control. 
This could be done, for instance, by 
giving subgroups in each condition dif- 
ferent fixed numbers of acquisition 
trials. 


LIMITATIONS OF THE SHAPE 
Function METHOD 


There are two main considerations 
which limit the applicability of the 
shape function method: it may be mis- 
leading, and it may be incorrect. 

The first limitation arises from the 
fact that average curve shape is an 
overall characteristic of the data. As 
with any average, preoccupation with 
average curve shape may result in the 
neglect of important information. 
Thus, the desirability of considering 
individual differences in curve shape 
has been noted by many (e.g., Hilgard, 
1938). Somewhat different aspects of 
this same general problem are illus- 
trated in the following two simple 
models. 
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In the all-or-none, one-trial learn- 
ing model discussed by Bush and 
Mosteller (1959), the curve shape for 
a single subject is a step function 
whereas the mean curve over subjects 
is a simple growth function. This 
model may be a realistic one for some 
learning situations (e.g., Hayes, 1953; 
Philbrick & Postman, 1955; Voeks, 
1954). Suppose that it applies to a 
given extinction situation so that there 
is a certain probability that the ex- 
tinction will occur on a given trial if it 
has not yet occurred. If this proba- 
bility of extinction is different for two 
different groups, then their mean ex- 
tinction curves will be different. The 
shape function method will show this 
and so gives a truthful result. How- 
ever, in such a situation, it would be 
more enlightening to use a backward 
learning curve (Hayes, 1953), or per- 
haps the descriptive statistics derived 
by Bush and Mosteller (1959) and by 
Bower (1961). 

For the second example, consider 
extinction after avoidance conditioning 
and suppose that the change in CR 
probability on a given trial depends on 
whether or not a CR occurred on that 
trial. Clearly two parameters will be 
needed in order to represent the dis- 
tinct effects of CR and non-CR trials. 
The shape of the mean extinction 
curve will represent an average or re- 
sultant of the two parameters. Study 
of the shape will not give very specific 
information about the parameters indi- 
vidually and may not even be adequate 
to show that a two-parameter model is 
required. In this sort of situation, an 
analysis of the sequential dependencies 
(Anderson, 1959; Sheffield, 1948) 
would appear to offer more promise 
than the shape function method. 

The second and more serious limita- 
tion on the shape function method 
arises from the definition of shape 
used in this method. Equation 3 im- 
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plicitly assumes that it is legitimate 
to separate the initial response level, 
R(1), from the shape of the curve, 
f(n). This separation is not neces- 
sarily proper. In the two-parameter 
avoidance conditioning model just 
mentioned, the shape function method 
gives an incorrect result. 

To illustrate this numerically, con- 
sider two groups, A and B, with re- 
spective CR probabilities of 0.8 and 0.6 
on Trial 1 of extinction. Suppose that 
for both groups response probability 
decreases by the proportion 1⁄4 on non- 
CR trials, and by the proportion o 
on CR trials. Since these parameters 
are the same for both groups, they 
should presumably be considered 
equally resistant to extinction. Now 
in Group A, 0.8 of the subjects will 
make a CR on Trial 1; their response 
probability decreases by 4» to become 
0.72 on Trial 2. The remaining 0.2 
of the subjects of Group A will not 
make a CR on Trial 1; their response 
probability decreases by 4 to become 
0.40 on Trial 2. The mean response 
probability of Group A on Trial 2 is 
therefore 0.656. Similarly, the mean 
response probability for Group B on 
Trial 2 is 0.444. The decrement for 
Group A is 0.144; that for Group B is 
0.156. Since the group initially higher 
shows the smaller decrement, it may 
be suspected that the shape function 
method will find different curve shapes 
for the two groups. Application of 
Equation 3 confirms this: the mean 
values of f(2) are 0.82 and 0.74 for 
Groups A and B, respectively. The 
shape function method thus gives an 
incorrect result and it is fairly easy 
to show that it will always do so for 
this model unless the two parameters 


are equal. 

Of course, the preceding model il- 
lustration may have no basis in reality. 
However, it does show that the shape 
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function method does not guarantee a 
correct result. 

It would thus appear desirable to 
know exactly what specific models are 
subsumed by the shape function 
method. The following remarks are 
incomplete but should serve as a rough 
practical guide. 

The shape function method will be 
applicable when the following condi- 
tions are met: (a) there is a family 
of linear operators which may be dif- 
ferent for each subject and for each 
trial; (b) the operators have the form 
of Equation 2 and, for a given sub- 
ject have all the same R( 0), differing 
only in the rate parameter, 6; (c) the 
change in response for a given subject 
on a given trial is governed by an 
operator chosen from the appropriate 
family with a probability independent 
of the overt response. The null hy- 
pothesis to be tested is then that the 
values of @ are, on the average, the 
same for the various experimental 
conditions. 

As a special case of these conditions, 
there may be only one operator in each 
family. The shape function method 
is always applicable to such one-pa- 
rameter linear operator models even 
though the parameter may change or 
fluctuate from trial to trial for a given 
subject. 

The listed conditions are sufficient 
but not necessary. Under certain con- 
ditions different operators may have 
different values of R(«). In addi- 
tion, the method applies to continuous 
response models (Anderson, 1961, in 
press) in which the change in re- 
sponse, though not the rate parameter 
of the operator, depends on the sub- 
ject’s actual response. 

On the other hand, the above avoid- 
ance extinction model suggests that the 
shape function method will not be ap- 
plicable if the rate parameter of the 
operator applying on a given trial de- 
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pends on the response on that trial, 
Although this is plausible, it has not 
been proved and may not be true. 


Discussion 


The present article has considered 
the comparison of resistance to ex- 
tinction specifically when a correction 
for differences in initial extinction re- 
sponse level is required. Such correc- 
tions necessarily rest on some mathe- 
matical model and of those that have 
been examined here, only the para- 
metric method based on some specific 
model, and the shape function method 
have an objective justification. 

The use of a specific model is the 
more attractive in several ways. Pa- 
rameter estimates are more precise 
descriptive statistics than overall curve 
shape, and would be expected to be 
more directly meaningful. Moreover, 
it is always possible, at least in prin- 
ciple, to test whether a specific model 
adequately fits the data. This feature, 
which is not possessed by the shape 
function method, helps somewhat to 
guard against unjustified interpreta- 
tions of the statistical analysis. Typi- 
cally, also, specific models yield theo- 
retical expressions for many aspects of 
the data (e.g., Anderson, 1959, 1963; 
Bush & Mosteller, 1955, 1959; Bush 
& Sternberg, 1959), and these may 
pinpoint deficiencies in the psycho- 
logical analysis underlying the choice 
of model. Unfortunately there are 
few known experimental situations 
for which a specific model does fit the 
data very well. For the particular 
purpose of comparing resistance to ex- 
tinction, therefore, the parametric ap- 
proach is not too satisfactory. 

The shape function method, on the 
other hand, comprises a large class of 
specific models. Since no further as- 
sumption is required about which par- 
ticular one applies in any given case, 
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the method has considerable presump- 
tive generality. 

In addition, the shape function 
method is casy to use whereas param- 
cter estimation for many of the spe- 
cific models which it includes may 
be quite difficult. In fact, if the ex- 
tinction rate parameter changes over 
trials, a specific model may be next 
to impossible mathematically. Conse- 
quently, if the main interest lies in 
comparing resistance to extinction, the 
shape function method has much to 
recommend it. Nevertheless, the 
shape function method may not be uni- 
versally appropriate as was shown by 
the above avoidance conditioning 
model. Indeed, it seems evident that 
there is no universal method to be 
discovered in the present state of 
knowledge since comparing resistance 
to extinction is a theoretical problem 
not to be answered from data alone. 

This last point may be seen more 
clearly if one aspect of the concept of 
resistance to extinction is taken up 
somewhat more explicitly. It has al- 
ready been noted that resistance to €x- 
tinction may be multidimensional. 
Even in the simple linear model, the 
terminal level might on occasion be 
used as a measure of resistance to 
extinction in addition to, or instead of 
the extinction rate. The cited avoid- 
ance model used two extinction rates 
and the same would be expected if, 
for instance, extinction were charac- 
terized by decreases in the strengths of 
fear and of habit. On a more general 
level, the possibility must be allowed 
that more than one psychological proc- 
ess underlies the extinction behavior. 
Logically, it is possible that one could 
get a direct, empirical, assumption- 
free assessment of each process sepa- 
rately but this does not seem likely in 
practice. If it is not possible, then the 
assistance of theory will be required 
to unravel the laws of extinction be- 


havior and through them allow the 
measurement of the parameters of the 
laws in terms of which resistance to 
extinction is to be defined. 

If resistance to extinction is not a 
single dimensional concept, then a 
single measure will not ordinarily be a 
sufficient descriptive statistic. A sin- 
gle measure would represent some 
average resultant of the processes un- 
derlying the behavior, but the exact 
definition of this average could proba- 
bly not be determined without the aid 
of theory. Of course, a single measure 
could still be a useful index of over- 
all effects even if it was not as in- 
cisive as would be desired and even 
though on occasion it might be mis- 
leading. 

Some question thus arises as to the 
advisability of attempting to compare 
resistance to extinction when a cor- 
rection for initial level is required. 
However, the alternative procedure, 
that of random assignment to extinc- 
tion conditions, is not a panacea. 
Thus, initial extinction levels may dif- 
fer because of change of set or from 
differential stimulus generalization, 
even though the extinction rates them- 
selves might be the same. Nor does 


‘random assignment rule out the pos- 


sibility that the observed differences 
are not some misleading average or 
resultant of differentially influential 


processes. Finally, even though a 
random assignment procedure is most 
useful in demonstrating that certain 
experimental variables have an effect 
on the behavior, such knowledge goes 
only part way toward elucidating the 
laws of extinction. 

It is hoped that the foregoing re- 
marks have set out the nature of the 
problem more clearly. In particular, 
it is clear that if extinction is a multi- 
dimensional concept, then the shape 
function method cannot, at least in its 
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present state of development, be ex- 
pected to give a representation of the 
behavior which is adequate for all pur- 
poses. However, this portrayal of the 
difficulties inherent in obtaining a com- 
plete and final solution to the nature of 
extinction behavior should not obscure 
the usefulness of and necessity for 
work at a grosser level of analysis. In 
certain cases, as when the initially 
higher of two curves falls below the 
other, there would be little hesitation 
in the interpretation, and other curve 
configurations could .be equally con- 
vincing. _Even when the differences 
are less pronounced, the shape func- 
tion method may be adequate even 
if it is only approximately correct. 
For such work, the shape function 
method should be a useful tool even 
though sometimes the inferences based 
on its use will be tentative. In such 
cases, of course, it would evidently be 
desirable to include as detailed a de- 
scription of the data as practicable in 
order to facilitate the reader’s interpre- 
tation of the results. 

Although the present discussion has 
been largely concerned with resistance 
to extinction, the problems which are 
found here arise in other areas of 
learning also. 
only to touch on such related applica- 
tions but it may be hoped that a more 
concerted attack would lead to more 
extensive results. Perhaps also the 
discussion will have analogical value. 
The question studied here is but one 
instance of the general problem of 
comparing the behavior of populations 
which are pre-experimentally different. 
If the foregoing mathematical model 
analysis has set out more clearly the 
nature of this problem in the present 
case, it seems reasonable to expect 
that the same approach will be helpful 
in other instances of the general prob- 
lem as well. 
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QUANTITATIVE LAWS FOR SENSORY PERCEPTION! 


JOHN L. STEWART? 


University of Arizona 


A model for subjective intensity derived from an elementary sensor 
provides linear filtering, rectification with variable power law exponent, 
and finite timeaveraging. The model is consistent with the physiological 
measure of dverage neural pulse rate. Simplified mathematical repre- 
sentations are employed to explain partial and complete masking. The 
Stevens law and a modified Weber law are derived as special cases. 
When extended to an array of sensors, a broadly significant pattern 
theory for recognition results which explains diplacusis and other phe- 


nomena. 


Direct electronic simulation may be achieved (and has) so 


as to yield solutions to problems which are too complex to be analyzed 


in other ways. 


A general theory is presented which 
considers two phenomena. First there 
is developed a representation for 
subjective intensity from which quan- 
titative theories for signal suppression 
by noise (masking) and just noticeable 
differences (the Weber law) result. 
Properties of neural patterns which 
permit recognition of complex stimuli 
are then discussed. Although applied 
primarily to hearing, the theory is not 
necessarily restricted to this one 
sensory system. 

The concepts presented here lead to 
design parameters for an electronic 
analog representation of a typical 
sensory receptor. The analog con- 
stitutes a close model in that the 
components of the device bear a one- 
to-one relationship with parts of the 
sensor. One such device has been 
constructed for hearing (Caldwell, 
Glaesser, & Stewart, 1962). 
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Classical Concepts 


Considerable discussion has been 
devoted to subjective intensity, mask- 
ing, and the Weber law (Helm, 
Messick, & Tucker, 1961 ; Luce, 1959; 
Stevens, 1959, 1961). The usual as- 
sumption is an empirical mathematical 
formula such as 


Loudness = K F” [1] 


where K is a constant of propor- 
tionality, m is the power-law ex- 
ponent, and F is the measure of 
stimulus intensity. The term “loud- 
ness,” which applies to hearing, is 
employed in a generic sense. 

The quantity F in Equation 1 is 
the mean square value of the stimulus 
time wave form, or something pro- 
portional to a power of the mean 
square value. In hearing, for ex- 
ample, F is the mean square value of 
the pressure wave form f(t) defined as 


F= (0) -Limp [fee dt (2) 


whereas in taste F might be related to 
chemical concentration. Note in 
Equation 2 that an infinite averaging 
time is specified which in turn implies 
that f(t) is “stationary” (ie, its 
several statistical measures do not 
change with time). 
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Equation 1 is found to provide a 
better fit with empirical data by 
including another adjustable param- 
eter as (Zwislocki & Hellman, 1960) 


Loudness = K(F + constant)" [3] 


Stimulus F may be only partly 
external; there may be included a 
component internal to the biological 
organism. This suggests that f(t) be 
separated into additive parts as 
a signal s(t) and an interference 
(“noise”) n(t) as 

fQ = s) +n [4] 
where s(t) and z(t) may in themselves 
be separated into additive parts. If 
signal and noise are independent of 
one another such that modifying s(¢) 
does not affect n(t) and conversely, 
then mean square values are also 


additive as 
F=S+WN [5] 


where S is the mean square value of 
s(t) and N is for n(t). The constant 
in Equation 3 can thus be identified 
with noise. The power-law function 
of Equation 1 becomes the “loudness 
function” 


Loudness =L (S, N, n) 
=K(S+N)"=K(S+N)"? 
=KN*?2(1+.S/N)"? 

=K'(1+5/N)"" [6] 


where there is defined 1/2 = m for 
reasons that will later become ap- 
parent. Because it is a general law, 
Equation 6 is plotted in Figure 1 for 
representative values of n. The im- 
portant variable is the ratio of mean 
square values S/N, which is com- 
monly referred to as the “signal-to- 
noise ratio.” In the case of sound as 
well as for many other types of 
stimuli, S/N is a power ratio. 


Wave-Form Processing Model 


Behavior as in Figure 1 and Equa- 
tion 6 is familiar to the communica- 
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relative to K’ for various values of Power- 
Law Exponent n. 


tions scientist ; the so-called power-law 
detector behaves this way. The de- 
velopment of a general wave-form 
processing model may consequently 
be undertaken. Let f(t) be a general 
time wave form which represents ef- 
fective total stimulation of a sensory 
organ. As such, f(t) includes un- 
avoidable noise and may (or may not) 
be both positive and negative in sign 
as, for example, electric field strength 
variations in thermal radiation or 
sinusoidal pressure variations in sound. 
Subjective intensity, which has no 
meaning as a negative quantity, is 
logically determined by the magnitude 
of f(t), which is denoted | f(#)|. Since 
no sensory organ can respond instan- 
taneously, subjective intensity must 
depend on some average in time of 
|f@|, which is denoted Av|f()|. 
Upon generalization, subjective in- 
tensity L may be represented in the 
form of the average of a power series 


in | f@| as 


L = Av{|f(Q|"LOo+Ci| FO| 
+ C2|fOl?+:°-J} C7] 


in which Exponent » need not be an 
integer (and thus Equation 7 is more 
general than a Taylor series) and 
where coefficients Co, Ci, Cot +> may 
be functions of time and perhaps also 
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of other parameters such as environ- 
mental factors. Equation 7 con- 
stitutes a loudness function hypothesis. 

Subjective intensity is a nonnega- 
tive monotonically increasing function 
of stimulus intensity which usually 
displays a leveling off (or saturation) 
for sufficiently intense stimuli. Equa- 
tion 7 is capable of representing any 
(continuous) function of this type to 
arbitrary accuracy. 

The mechanism of averaging con- 
stitutes an integration over a finite 
time interval, and perhaps includes a 
weighting function. Since the manip- 
ulation is a linear one, the average of 
the sum is the sum of the averages as 


k=% 


L= LAC FO [4 


k=% 


= 2 CrAvt|fO|"*} [8] 


where the approximation applies if 
the C, are slowly varying compared 
with the averaging time. Since the 
averaging time in Equation 8 is of the 
order of a fraction of a second, the 
approximation would seem to be valid 
for fatigue and adaptation which 
occur over periods of seconds or 
minutes or longer. 

The gross stimulus wave form f(t) 
may consist of several additive com- 
ponents, including unavoidable stimuli 
(noise) and purposeful stimuli (sig- 
nals). The several components may 
not be specified in similar ways. 
That is, some may be evaluated ex- 
ternal to the frequency sensitive path- 
way leading to a sensor and others 
may be specified internal to the neural 
system. The gross stimulus can be 
constructed from components which 
are affected by general operators of 
p = d/dtas 


FO = Filb) fl) 
+ Filp) fo) +: [9] 


where the F,(p) for p = jw constitute 
typical steady state transfer functions 
of radian frequency w as can be 
represented with electric analog net- 


works. Without use of suitable 
operators as in Equation 9, the 
mathematical formalism does not 


differentiate stimuli on the basis of 
frequency components. 

Considerable mathematical simpli- 
fication results for an infinite-duration 
averaging time in which no time 
weighting function exists. By use of 
the definition of Equation 2, Equation 
8 may be approximated as 


LSE (Glom) 


SE CSO) [10] 


where the second approximation as- 
sumes that the C; are constants. 

Generalized relationships thus far 
provided, especially for Cx constant or 
slowly varying, can be modeled with 
electronic devices. The model be- 
comes an analog computer which can 
be employed in the study of wave 
forms f(t) which are too complex to 
permit reasonably direct mathemat- 
ical calculations to be made. As will 
later become evident, the general 
representation must be iterated in 
order to’ provide a model that de- 
scribes more than simple intensity 
discrimination. In this case, not only 
are theoretical mathematical calcula- 
tions beyond normal reason to at- 
tempt, but even digital solutions using 
large machines are unreasonable—the 
only known effective approach to 
solutions for problems of this sort is 
direct analog simulation. 


Noise 


Noise in the gross stimulus f(t) may 
arise from external or internal sources. 
Because f(t) is thus in part random, 
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the loudness function as given in 
Equation 8 is a random variable and 
hence is subject to study by statistical 
measuring and hypothesis testing 
techniques. For the approximation 
of Equation 10 in which an infinite 
duration time average is employed, 
the random nature of L is suppressed, 
although the proper average value is 
retained. Another interpretation of 
Equation 10 (as a variation of 
Equation 8) is that it gives Z in the 
ensemble average sense. 

Noise internal to the biological 
organism may stem partly from 
spurious neural impulses and partly 
from random variations in synaptic 
triggering levels (Frishkopf & Rosen- 
blith, 1958). An important but not 
commonly appreciated fact is that 
internal noise depends upon the gross 
stimulus. In order to evaluate this 
effect, assume that the sensory signal 
consists of a sequence of identically 
shaped neural pulses which occur at 
random in time at an average rate of k 
per second. The spectrum of the 
pulse sequence is 


ways [e ih p RO ar| 25 (w) 


k |G (ju) |? 
a 


where g(t) represents a single pulse 
with G(jw) its Fourier transform 
(Stewart, 1960, p. 311). The average 
value m and variance o° are 


k E  g(t)dt 


(e/2x) |7 |G) |*do 


+ 


[12] 


m 


g? 


Il 


= rf eoa [13] 


where Equation 13 utilizes the Parse- 
val theorem. The pulse sequence is 
equivalent to a constant m upon 
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which is superimposed a random 
component with root mean square 
value ø. Because & increases with the 
stimulus intensity, loudness may be 
presumed to increase in some manner 
with $; but since o* increases with k, 
unavoidable noise also increases with 
loudness. 

If m is the measure of the applied 
stimulus (which may of itself consist 
of a superposition of deterministic and 
random variables), then ø is the 
measure for induced noise internal to 
the sensory system. A signal-to- 
unavoidable noise ratio m/e may be 
formed. It is observed that, for 
neural pulses of constant shape, 
m/s « Vk and thus induced noise 
becomes negligible by comparison 
with the applied signal when loudness 
is relatively large. 

Provided that the sensory system 
shows relatively few neural pulses in 
the absence of stimuli, and provided 
that the stimulus is also devoid of 
superimposed noise, it can be argued 
that induced noise has minimal (but 
not necessarily negligible) importance. 
The averaging process amounts to 
narrow-band low-pass filtering. Al- 
though the average value of the pulse 
sequence remains unchanged sub- 
sequent to the filter, the mean square 
value is greatly reduced because most 
of the spectral components of the 
individual neural pulses are not trans- 
mitted by the filter. In the case in 
point, random variations in synaptic 
trigger levels and spurious neural 
pulses are important only to the 
extent that they produce low-fre- 
quency noise. 

When noise contaminates the signal, 
on the other hand, induced noise can 
not be differentiated from stimulus 
noise prior to the filtering process and 
hence also subsequent to this process. 
In addition, spurious neural pulses 
and the randomness created by vary- 
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ing synaptic triggering levels become 
relatively more important. 


A First-Order Model for Loudness 


The simple “block-diagram”’ system 
in Figure 2 is suggested as a model for 
a sensory system. The decision 
system in Figure 2 might perhaps 
apply a hypothesis test to the output 
from the “loudness converter.” The 
decision system does not greatly con- 
cern the theory of loudness because 
ensemble averages are implied. The 
subject of specific interest at this 
point is the detailed form of the loud- 
ness converter. 

The basic model proposed by Equa- 
tion 7 (for Cs constant) may be 
implemented with three sequential 
functions. First is a linear operator 
from the source of the (single) 
stimulus as specified by Equation 9; 
this operator may be realized with an 
electric network. Second is a full- 
wave or half-wave detector which 
provides power series response as 
represented within the braces of 
Equation 7. Finally there is an 
averaging device which can be realized 
with an electric network. The model 
constitutes a type of “averaging” (not 
“peak’’) detector. 

The behavior of the linear operator 
depends intimately upon the par- 
ticular sensory system. This operator 
provides at least some frequency 
selectivity so that, for example, fre- 
quencies pertaining to heat and light 
are not permitted to excite the 
auditory apparatus. One part of the 
linear operator function is common to 
many different sensory systems such 
that volleys of neural pulses may be 
modeled. A high-pass transfer func- 
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tion jwr/(1 + jwr) (or a more com- 
plex one if warranted) provides this 
specific representation. The critical 
cyclic frequency 1/277 depends upon 
the sensory system; for hearing it 
is of the order of 1 kilocycle per 
second whereas for some receptors it 
may be very small and even zero (as, 
for example, when a steady pressure 
results in a stream of neural pulses 
without evidence of volleys). At 
relatively high frequencies, the func- 
tion jwr/(1 + jwr) becomes unity, 
which implies that volleys no longer 
exist. 

The detector may be realized with 
diode electronic devices, among others. 
The detector must show saturation 
because a sensory cell has an upper 
limit to its firing rate. The loudness 
converter in Figure 2 represents a 
fairly elementary sensor and hence it 
can model detection by only a single 
cell (or a patch of sensory cells 
provided that all of the cells in the 
patch are excited equally). 

The averaging device can be repre- 
sented with a simple low-pass electric 
filter which has a cutoff characteristic 
with a slope of 6 decibels per octave. 
The bandwidth of this filter may be of 
the order of 1 or 2 cycles per second, 
which corresponds to a time constant 
of the order of 1/10 second (Munson, 
1947 ; Zwislocki, 1960). 

The idealized detector as described 
is not capable of representing induced 
noise. In order to do this, the de- 
tector (but not the initial operator or 
the averaging device) may be replaced 
with a suitable random pulse genera- 
tor, or with a collection of such 
generators for the representation of a 
patch of sensory cells. The average 
pulse rate from the generator should 
be a monotonic function of the applied 
signal amplitude (subsequent to the 
initial operator) and saturation in the 
form of a maximum allowable pulse 
rate should also exist. 
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The representation for certain sen- 
sory systems may require two de- 
tectors; an example of this is the 
thermal sensor. The oscillating elec- 
tric field of heat radiation (or me- 
chanical vibration of atoms in heat 
conduction) is at far too high a 
frequency to be followed in detail by a 
sensor. The result is an initial 
square-law detection with an averag- 
ing time that may be related to the 
thermal time constant of the sensor. 
Subsequent to this step is the con- 
version to neural pulses which is 
followed in turn by a second averaging 
process. 


Special L Functions 


The remainder of this paper pre- 
sumes the ideal detector model in 
which induced internal noise is not 
produced. The admitted approxima- 
tion is a reasonable one, especially if 
variable internal noise is allowed to 
exist (at least conceptually). 

The number of specific signal-noise 
combinations that can be studied by 
means of the general theory is un- 
limited. The two cases of most 
practical importance can (fortunately) 
be solved for in general. In both 
cases, assume f(é) is as given by 
Equation 4 with mean square values 
as in Equation 5. The first case of 
interest is for s(t) a sine wave with 
peak value R (and hence S = R?/2) 
and n(t) Gaussian noise. Using the 
second approximation of Equation 10, 
there is obtained 


k=% 
LTZ CIECO) 


k=% 


= > CK; 
k= 


xn (- t1- 8) 04] 


where ıFı is the confluent hyper- 
geometric function and where 
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in which T is the gamma function 
(Middleton, 1960; Rice, 1954). 

Note in this general sine-wave case 
that Components s(#) and n(t) are 
independent variables, or in other 
words, they are orthogonal in the 
infinite time interval. This condition 
may be stated as 


(s(t)n(t)) = (s(t)n(t + to)) = 0 [16] 


where the relation involving to(to < ©) 
is the cross-correlation function. 

The calculation of Equation 14 
need not require direct time in- 
tegration. If it is presumed that the 
ergodic hypothesis applies, the sta- 
tistical average may be obtained 
instead. The calculation may employ 
the moment generating function 
(Middleton, 1960, p. 248) or the prob- 
ability density function for the en- 
velope of s(t) +n(t) (Rice, 1954, 
Equations 3.10-3.12). 

The second major case of interest 
assumes that s(t) and m(t) are both 
Gaussian random variables which are 
statistically independent. This case 
may be obtained from Equation 14 
upon setting S = 0 [and note that 
1Fi(a;b;0) = 1] and reinterpreting 
NasS+N. Thus 


kao 
LÆ > C:K,(1 + S/N) [17] 
k=0 


where K+ is the same as before. 

If saturation of the sensory system 
is not important, adequate approxi- 
mations for Equations 14 and 17 may 
be realized by ignoring all but the 
leading terms. There results 


L&=C.KuFki(—n/2; il — S/N), 
sine wave s(#) [18] 


L&CoK (1 +5/N)"2, 
Gaussian s(t) [19] 
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which are compared for three values 
for n(n = 1.0, 2.0, and 3.0) in Figure 
3. The remarkable similarity is to be 
noted. Under conditions for which 
Equations 18 and 19 apply, it is 
evident that (a band of) Gaussian 
noise may approximate a sine wave 
and conversely. Similarities between 
Equations 18 and 19 may be further 
demonstrated by means of asymptotic 
forms as 


Lsgw=CoK [1+ (n/2) (S/N) ], 
sine wave and Gaussian s(t) [20] 


Cok o(S/N)"?/T (1+n/2), 
syn% sine wave s(t) 
CoKo(S/N)"/?, Gaussian s(t) 


[21] 


It is of interest to note that the 
special result of Equation 19 for Co 
constant is the so-called Stevens law, 
L œ (S+ WN)? = (S+ N)”. This 
empirical law has been observed to be 
valid under conditions as in Equation 
19 (noise stimulus in a noise back- 
ground) and approximately valid for 
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the sine-wave signal as can be pre- 
dicted from Figure 3. The present 
theory does not support the special 
cases as fundamental but rather offers 
Equation 7 for this. 

The Stevens law bears surface 
resemblance to Fechner’s law. If 
L œ (S + N)”, then log L œ m 
log (S + N). lf a new function is 
defined as L; « mlog (S + N), the 
Fechner law results (compare L; with 
Log L). Fechner’s result can and has 
been thoroughly discredited (Stevens, 
1961). Its unreasonableness stems 
from the mathematical fact that it is 
irregular (logarithmic singularity) at 
the origin. In other words, if sensory 
deprivation results in S + N —> 0, 
then L;——« which is physically 
meaningless. 

It may be observed that mean 
square values S and N in foregoing 
special cases do not show spectrum 
data explicitly. This results be- 
cause effective values Fi (p)f,(¢) and 
F(p) f2(¢) as in Equation 9 have been 
presumed. Whatever may be the 
stimulus and noise components, their 
mean square values must be corrected 
with equivalent transfer functions 
prior to use in Equations 18 and 19. 


Masking 


For simplicity, the ensuing discus- 
sion will be made in terms of the 
Stevens law ; the reasonableness of the 
approximation for the case of a sine 
wave in noise has been demonstrated. 

The loudness function may be 
found from Equation 19 for large 
and small S/N as 


LsyvnKS"? [22] 
Ls<n=KN" 

X[1+ (n/2)(S/N)+-+-] [23] 
which shows that loudness grows 


with S at a rate that depends on the 
signal-to-noise ratio. Specifically, for - 
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n < 2and for constant S, the loudness 
due to S is diminished by increasing N. 

A recognition function hypothesis 
may be formed in terms of a definition 
for "recognition function” P as 


P(S, N,n) = L(S, N,n) 
—1L(0,N,n) [24] 


which for the case of Gaussian s(t) 
gives 


P(S, N, n) 
= K[(S + N)"? — N*?] [25] 


Further development of suitable 
predictive formulas is enhanced by 
means of an imaginary experiment as 
follows: In a low-noise environment 
N,, there exists a signal Sı which is 
large by comparison. The noise is 
increased from N, to N»; in order to 
maintain constant recognition P, the 

; signal Sı must be increased to 
S = &S, [which increases s(t) by 
factor 6]. In general, the required 
increase in the sfgnal is not propor- 

| tional to that of the noise. Two P 

| functions may be formed and equated 

' in order to determine the required 6° as 


PSs; N, n) = P(®S1, Nz, n) [26] 
| which for Gaussian s(¢) gives 


id [(Si+Ni)"/?+ Nor? — Nee }2/"— No 
= S; 
[27] 


For the important case Ni & N», 
ignore N; to find 


è = [1 + (N/S) I" — N/S 
(Ni<N2) [28] 


where N: = N and Sı = S have been 
substituted. The very special case 
-n = 1in Equation 28 yields the simple 
result 


Pe 1+ 2NN/S 
(Ni& Neandn =1) [29] 


ô? 
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For a sine wave s(é), results are 
mathematically more cumbersome but 
qualitatively and quantitatively are 
quite similar to those for Gaussian s(/). 

For n < 2, a finite value for ô is 
indeed predicted; this is sufficient to 
provide a model for partial or com- 
plete masking. In any properly con- 
ceived experiment which is performed 
at and near threshold, experimental 
data can always be fitted to curves 
for & as functions of N:/Sı = N/S 
within tolerable error for some value 
for n; in fact, curve fitting in this 
manner provides a mechanism for 
determining n. That an adequate fit 
may be realized is not an unexpected 
result for the simple reason that 
Equation 27 contains enough adjust- 
able parameters to provide a reason- 
able fit to any curve that is likely to 
be found by experiment. 

The particular case n = 2 in Equa- 
tion 27 yields ô = 1. In other words, 
biological sensory systems which are 
square law are theoretically immune 
to masking. For m> 2, theory 
indicates “inverse masking.” That is, 
an increase in the noise level augments 
rather than masks the signal. Loud- 
ness recruitment in hearing shows 
n > 2 in many cases. If changes due 
to fatigue could be avoided, an aid 
to hearing might be realized by 
providing background Gaussian noise 
with large intensities at frequencies 
where n > 2 and small intensities 
elsewhere. 

Induced internal noise may not be 
negligible in masking experiments be- 
cause noise exists in the stimulus by 
intent. The effect of this noise, as 
well as of other types of internal noise, 
is to make masking somewhat more 
abrupt than that predicted by fore- 
going equations. This phenomenon 
provides one reason for the relative 
ease with which consistent data con- 
cerning masking experiments may be 
obtained. 
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The criterion by which existence of 
masking may be demonstrated can 
be generalized. Let gi(é) and go(t) 
be statistically independent wave 
forms with finite mean square values. 
Masking of gı by gə (or conversely) 
occurs if 


(lgi + gel") = (ler]" + lgl”) 
= ([gi1") + (lgel") [30] 


which can be extended to an arbitrary 
number of additive signals in an 
obvious manner. The equality in 
Equation 30, which precludes mask- 
ing, occurs in general only for n = 2. 


A New Weber Law 


Let Gaussian signal S be increased 
to S+ AS. A ratio 


L(S+-AS, N,n)—L(S,N,n) AL 
L(S, N,n) = 


Ẹ 
[1] 


is then formed. Next expand Equa- 
tion 31 in a Taylor series in AS and 
retain only first-order terms; this is 
equivalent to using the derivative 
dL (with respect to S) for AL. There 
results 


GE AL 
Bane 


dealt 
i¢ms s BA 


Perle 
= 


which shows the important special 


case 
( +) _ # AS 
L Large S/N i 2 S 


It may be noted that Equation 33 
is the classic Weber law: AS/S is 
independent of S for large S (al- 
though it is presumed that S is not so 
large as to introduce saturation). The 
classic law is not valid at relatively 
small S. An attempt to improve on 
the law is suggested with a neo- 
Weberian hypothesis: The subjective 
measure of relative intensity change 
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AL/L is independent of stimulus 
intensity S. Thus 


AL AL 
oe =i == 34 
( L ist S ( L A 5 [34] 


If primed values denote large S 
and unprimed ones signify general S, 
there results from Equation 34 
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In Figure 4 are shown two curves, 
one which is for AS/S relative to 
AS'/S', and another for AS/S which 
assumes a particular value for AS’/S’. 
The second curve has been shifted 
horizontally as well as vertically in 
order to account for uncertainties in 
absolute noise level in an experimental 
observation. The shifted theoretical 
curve in Figure 4 compares with well- 
known data concerning hearing 
(Miller, 1947). It is of interest to note 
that the power-law exponent does 
not contribute to the result of Figure 
4. In fact, the only parameter of 
importance is the ratio S/N, which 
has been found by experiment to be 
of major importance in the case of 
hearing (Sherrick, 1959). 

The foregoing theory agrees reason- 
ably well with experimental data 
pertaining to tactile sensations, and 
other senses (Geldard, 1953). 

In the case of sine-wave excitation 
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noise, there may be 


(S/N), Fi(1—n/2; 2; —S/N) 


[36] 


which, although ominous in appear- 
ance, shows very similar behavior to 
the case of Gaussian s(f); in fact, the 
result is exactly the same for n = 2. 
In order to better demonstrate the 
similarity, asymptotic expressions per- 
taining to Equation 36 are provided as 


AS/S n+ for S&N 
ASS | 14 U2 for SSN oh 
S/N or 


which, in contrast with the case of 
Gaussian s(t), shows (weak) de- 
pendence on Exponent n. 

That the difference limen function 
for a sine wave may sometimes be 
found to be appreciably different from 
that for a Gaussian signal is not due 
to a weakness in foregoing mathe- 
matical manipulations (except for the 
question of induced noise as has been 
noted). Rather, it stems from con- 
siderations relevant to pattern recog- 
nition which may partly invalidate 
the elementary system of Figure 2, 
including simplified statistical decision 
concepts. 

A general observation based on the 
loudness function may be made. 
Because averaging is ideally over a 
time period that is long compared 
with detailed variations in f(t), mod- 
erate modifications of the time scale 
for f(t) are unobservable. Thus the 
theory does not apply directly to 
determinations of frequency or spec- 
trum changes; rather, only overall 
intensity changes are described. If, 
however, a sensory system displays 
resonance or frequency dependent be- 
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havior (which may be included in the 
operator function in Equation 9), 
frequency changes result in intensity 
changes and are evaluated as such. 
Mathematical formulation applicable 
to frequency change is thus not facile 
because the particular and detailed 
character of the relevant sensory 
mechanism becomes of major im- 
portance. It may thus be predicated 
that difference limens of frequency do 
not display properties that are as 
consistent as are those for intensity. 

One aspect of the Weber law as has 
been described requires clarification. 
The equations predict finite AS/S for 
arbitrarily small S/N. Clearly, a 
sensor can not have arbitrary sensi- 
tivity; the curve in Figure 4 should 
in fact go to infinity at the S/N ratio 
applicable to threshold. The reason 
for the discrepancy stems from the 
approximation dS for the finite incre- 
ment AS. If Equation 31 is studied 
in terms of AS as a power series in AS, 
it is found that the true curve rises 
faster than Figure 4 for small S/N 
ratios. However, until the curve gets 
to within one or two jnd’s (just notice- 
able differences) of absolute threshold, 
the error inherent in employing dS 
for AS is small. 


The Necessity for Patterns 


It is perhaps self-evident that the 
model in Figure 2 can not recognize 
anything but intensity change; for it 
represents only a single sensory cell 
(or group of cells which are excited in 
some way). Certainly, ability for 
recognition of a sustained vowel sound 
or a persistent smell is not implied. 

In order to recognize a tone in noise, 
for example, more than one sensory 
cell (or more than one patch of cells) 
must be excited so that signal and 
noise components contribute differ- 
ently. In other words, a “transducer” 
is required which provides a pattern 
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Fic. 5. Various basilar membrane patterns 
for a tone in noise. 


of excitation upon an array of cells 
such that spectral pattern preferences 
occur. The cochlea performs this 
function in hearing. For smell, the 
system might consist of semispecific 
chemically responsive cells whose 
neural responses organize to a pattern. 

It is instructive to visualize the 

pattern of (detected) intensities along 
the basilar membrane of the inner ear 
(von Békésy, 1960). Patterns due to 
s(t) alone, n(#) alone, and the sum 
s(t) + n(t) are indicated in Figure 5. 
With noise, an observer hears a more- 
or-less corrupted version of the tone. 
It has been observed that a change in 
pitch (which is the subjective measure 
for frequency) occurs in such in- 
stances (Egan & Meyer, 1950; Schu- 
bert, 1950). 

Observe in Figure 5 that each patch 
of sensory (hair) cells is subjected to a 
different S/N ratio and thus relative 
suppression must vary in the neural 
representation of points along the 
basilar membrane. The dashed curve 
in Figure 5 attempts to indicate the 
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“effective” pattern, which is the sup- 
pressed pattern after removal of 
noise. In general, the centroid of the 
effective pattern is shifted from that 
applicable in the absence of noise. A 
pattern hypothesis states that pitch is 
discerned by matching suitable mem- 
ory-stored patterns with effective 
patterns. Accordingly, pitch depends 
to some extent on relative suppression ; 
monaural “‘diplacusis” or pitch change 
can be associated with partial mask- 
ing. The amount of pitch change 
depends on the asymmetry of the 
noise-free basilar membrane pattern. 
On the basis of available data con- 
cerning such patterns, an upward 
shift in pitch with noise is suggested 
at high frequencies (Figure 5b) and a 
rather uncertain shift is implied at 
low frequencies (Figure 5a). 

In the case of hearing, there should 
be employed a number of loudness 
converters in order to represent a 
number of cell patches. These loud- 
ness converters should be suitably 
excited as in the actual cochlea, which 
can be done by placing them along an 
artificial, nonuniform, dispersive, 
transmission line which represents the 
cochlea. Outputs from individual 
converters do not excite individual 
decision systems; rather, recognition 
of entire patterns is required (where 
such recognition is at least partly a 
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Fic. 6. Skeletal outline for a pattern 
recognizing hearing system. 
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cortical function). The skeletal sys- 
tem in Figure 6 is thus deduced (Cald- 
well et al., 1962). It is of interest 
to note that, if signal and noise 
produce similar basilar membrane 
patterns and if loudness converters are 
all the same, the system in Figure 6 
has the valid equivalent of Figure 2. 

Although Figure 6 represents a far 
more complex model than has cus- 
tomarily been studied in hearing, it is 
yet deficient in certain (second-order) 
respects. First, a feedback mech- 
anism must average the outputs from 
the loudness converters in order to 
suitably control stapedial and tym- 
panic muscles. Second, each loudness 
converter must be equipped with some 
type of weak energy source with 
relatively slow recovery so that fatigue 
of individual patches of hair cells may 
be represented. Then some method 
for including cross-connecting and 
efferent nerve fibers (so as to realize 
mutual inhibition in part) should be 
devised, which is not possible in view 
of the present state of knowledge 
regarding the inner ear. 

When Figure 6 is augmented, the 
meaning of many classical experi- 
ments in psychoacoustics can be 
questioned. A single model such as 
that of Figure 2 ignores the multiple 
channels and multiple time constants 
in Figure 6; every different testing 
dogma can thus lead to a different 
test result. 

It should not be assumed that the 
complexity of Figure 6 is undesirable. 
To the contrary, the more realistic is 
a model, the better it is in guiding 
meaningful research and in interpret- 
ing test results. The simple system 
in Figure 2 is entirely adequate 
in many situations, especially near 
threshold and when s(¢) and (#) have 
reasonably similar spectra. Of course, 
the loudness converter in Figure 2 
forms a basic part of Figure 6. 
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Conclusion 


The principal result of this paper is 
a set of hypotheses. One of these is 
basic to the others: The loudness 
function hypothesis defines a de- 
tailed wave-form processing model 
which facilitates calculations using 
arbitrary stimulus wave forms. The 
loudness model agrees with empirical 
observations. 

The basic model is augmented with 
a neo-Weberian hypothesis in order to 
obtain a quantitative description for 
difference limens which applies at 
both small and medium stimulus in- 
tensities, The basic loudness model 
is augmented with a recognition func- 
tion hypothesis which leads to a 
quantitative model for the phenom- 
enon of partial masking. A pattern 
hypothesis augments the basic loud- 
ness model in order to explain a type 
of diplacusis. This hypothesis also 
leads to a sophisticated model for 
hearing which aids in clarifying gen- 
eral mechanisms and which suggests 
statistical decision concepts in per- 
ception which are based on pattern 
recognition. 

Throughout the development, two 
types of stimuli, a sine wave and a 
Gaussian random variable, have been 
employed and pertinent equations 
set forth. Because these two stimuli 
behave similarly, justification is pro- 
vided for approximation of a sine 
wave with a band of Gaussian noise 
and conversely. Mathematical theory 
is for the most part generalized so as 
to apply to a variety of sensory 
systems. 
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Bayesian statistics, a currently controversial viewpoint concerning 
statistical inference, is based on a definition of probability as a par- 
ticular measure of the opinions of ideally consistent people. Statistical 
inference is modification of these opinions in the light of evidence, and 
Bayes’ theorem specifies how such modifications should be made. The 
tools of Bayesian statistics include the theory of specific distributions 
and the principle of stable estimation, which specifies when actual prior 
opinions may be satisfactorily approximated by a uniform distribution. 
A common feature of many classical significance tests is that a sharp 
null hypothesis is compared with a diffuse alternative hypothesis. 
Often evidence which, for a Bayesian statistician, strikingly supports 
the null hypothesis leads to rejection of that hypothesis by standard 
classical procedures. The likelihood principle emphasized in Bayesian 
statistics implies, among other things, that the rules governing when 
data collection stops are irrelevant to data interpretation. It is 
entirely appropriate to collect data until a point has been proven or 
disproven, or until the data collector runs out of time, money, or 
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patience. 


The main purpose of this paper is 
to introduce psychologists to the 
Bayesian outlook in statistics, a new 
fabric with some very old threads. 
Although this purpose demands much 
repetition of ideas published else- 
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where, even Bayesian specialists will 
find some remarks and derivations 
hitherto unpublished and perhaps 
quite new. The empirical scientist 
more interested in the ideas and im- 
plications of Bayesian statistics than 
in the mathematical details can safely 
skip almost all the equations; detours 
and parallel verbal explanations are 
provided. The textbook that would 
make all the Bayesian procedures 
mentioned in this paper readily avail- 
able to experimenting psychologists 
does not yet exist, and perhaps it can- 
not exist soon; Bayesian st: İstics as 


a coherent body cf thought ‘ll too 

new and incomplete. 1 
Bayes’ theorem is a s,- and 

fundamental fact about obability 
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that seems to have been clear to 
Thomas Bayes when he wrote his 
famous article published in 1763 
(recently reprinted), though he did 
not state it there explicitly. Bayesian 
statistics is so named for the rather 
inadequate reason that it has many 
more occasions to apply Bayes’ theo- 
rem than classical statistics has. 
Thus, from a very broad point of 
view, Bayesian statistics dates back 
at least to 1763. 

From a stricter point of view, 
Bayesian statistics might properly be 
said to have begun in 1959 with the 
publication of Probability and Sta- 
tistics for Business Decisions, by 
Robert Schlaifer. This introductory 
text presented for the first time 
practical implementation of the key 
ideas of Bayesian statistics: that 
probability is orderly opinion, and 
that inference from data is nothing 
other than the revision of such opinion 
in the light of relevant new informa- 
tion. Schlaifer (1961) has since 
published another introductory text, 
less strongly slanted toward business 
applications than his first. And 
Raiffa and Schlaifer (1961) have pub- 
lished a relatively mathematical book. 
Some other works in current Bayesian 
statistics are by Anscombe (1961), de 
Finetti (1959), de Finetti and Savage 
(1962), Grayson (1960), Lindley 
(1961), Pratt (1961), and Savage et 
al. (1962). 

The philosophical and mathematical 
basis of Bayesian statistics has, in 
addition to its ancient roots, a con- 
siderable modern history. Two lines 
of development important for it are 
the ideas of statistical decision theory, 
based on the game-theoretic work of 
Borel (1921), von Neumann (1928), 
and von Neumann and Morgenstern 
(1947), and the statistical work of 
Neyman (1937, 1938b, for example), 
Wald (1942, 1955, for example), and 


W. Epwarps, H. LINDMAN, anv L. J. SAVAGE 


others; and the personalistic defini- 
tion of probability, which Ramsey 
(1931) and de Finetti (1930, 1937) 
crystallized. Other pioneers of per- 
sonal probability are Borel (1924), 
Good (1950, 1960), and Koopman 
(1940a, 1940b, 1941). Decision theory 
and personal probability fused in the 
work of Ramsey (1931), before either 
was very mature. By 1954, there was 
great progress in both lines for 
Savage’s The Foundations of Statistics 
to draw on. Though this book failed 
in its announced object of satisfying 
popular non-Bayesian statistics in 
terms of personal probability and 
utility, it seems to have been of some 
service toward the development of 
Bayesian statistics. Jeffreys (1931, 
1939) has pioneered extensively in 
applications of Bayes’ theorem to 
statistical problems. He is one of the 
founders of Bayesian statistics, though 
he might reject identification with the 
viewpoint of this paper because of its 
espousal of personal probabilities. 
These two, inevitably inadequate, 
paragraphs are our main attempt in 
this paper to give credit where it is 
due. Important authors have not 
been listed, and for those that have 
been, we have given mainly one early 
and one late reference only. Much 
more information and extensive bib- 
liographies will be found in Savage 
et al. (1962) and Savage (1954, 
1962a). 

We shall, where appropriate, com- 
pare the Bayesian approach with a 
loosely defined set of ideas here 
labeled the classical approach, or 
classical statistics. You cannot but 
be familiar with many of these ideas, 
for what you learned about statistical 
inference in your elementary statistics 
course was some blend of them. They 
have been directed largely toward the 
topics of testing hypotheses and 
interval estimation, and they fall 
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roughly into two somewhat conflicting 
doctrines associated with the names 
of R. A. Fisher (1925, 1956) for one, 
and Jerzy Neyman (e.g. 1937, 1938b) 
and Egon Pearson for the other. We 
do not try to portray any particular 
version of the classical approach ; our 
real comparison is between such pro- 
cedures as a Bayesian would employ 
in an article submitted to the Journal 
of Experimental Psychology, say, and 
those now typically found in that 
journal. The fathers of the classical 
approach might not fully approve of 
either. Similarly, though we adopt for 
conciseness an idiom that purports to 
define the Bayesian position, there 
must be at least as many Bayesian 
positions as there are Bayesians. Still, 
as philosophies go, the unanimity 
among Bayesians reared apart is re- 
markable and an encouraging symp- 
tom of the cogency of their ideas. 

In some respects Bayesian statistics 
is a reversion to the statistical spirit 
of the eighteenth and nineteenth 
centuries; in others, no less essential, 
it is an outgrowth of that modern 
movement here called classical. The 
latter, in coping with the consequences 
of its view about the foundations of 
probability which made useless, if not 
meaningless, the probability that a 
hypothesis is true, sought and found 
techniques for statistical inference 
which did not attach probabilities to 
hypotheses. These intended channels 
of escape have now, Bayesians believe, 
led to reinstatement of the probabili- 
ties of hypotheses and a return of 
statistical inference to its original line 
of development. In this return, 
mathematics, formulations, problems, 
and such vital tools as distribution 
theory and tables of functions are 
borrowed from extrastatistical proba- 
bility theory and from classical sta- 
tistics itself. All the elements of 
Bayesian statistics, except perhaps 
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the personalistic view of probability, 
were invented and developed within, 
or before, the classical approach to 
statistics; only their combination into 
specific techniques for statistical infer- 
ence is at all new. 

The Bayesian approach is a common 
sense approach, It is simply a set of 
techniques for orderly expression and 
revision of your opinions with due 
regard for internal consistency among 
their various aspects and for the data. 
Naturally, then, much that Bayesians 
say about inference from data has 
been said before by experienced, 
intuitive, sophisticated empirical sci- 
entists and statisticians. In fact, 
when a Bayesian procedure violates 
your intuition, reflection is likely to 
show the procedure to have been 
incorrectly applied. If classically 
trained intuitions do have some con- 
flicts, these often prove transient. 


ELEMENTS OF BAYESIAN STATISTICS 


Two basic ideas which come to- 
gether in Bayesian statistics, as we 
have said, are the decision-theoretic 
formulation of statistical inference 
and the notion of personal probability. 

Statistics and decisions. Prior to a 
paper by Neyman (1938a), classical 
statistical inference was usually ex- 
pressed in terms of justifying proposi- 
tions on the basis of data. Typical 
propositions were: Point estimates ; 
the best guess for the unknown num- 
ber pis m. Interval estimates; y is 
between mı and mə Rejection of 
hypotheses; » is not 0. Neyman’s 
(1938a, 1957) slogan “inductive be- 
havior” emphasized the importance of 
action, as opposed to assertion, in the 
face of uncertainty. The decision- 
theoretic, or economic, view of sta- 
tistics was advanced with particular 
vigor by Wald (1942). To illustrate, 
in the decision-theoretic outlook a 
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point estimate is a decision to act, 
in some specific context, as though 
were m, not to assert something 
about u. Some classical statisticians, 
notably Fisher (1956, Ch. 4), have 
hotly rejected the decision-theoretic 
outlook. 

While Bayesian statistics owes much 
to the decision-theoretic outlook, and 
while we personally are inclined to 
side with it, the issue is not crucial 
to a Bayesian. No one will deny that 
economic problems of behavior in the 
face of uncertainty concern statistics, 
even in its most “pure” contexts. 
For example, ‘Would it be wise, in the 
light of what has just been observed, 
to attempt such and such a year’s in- 
vestigation?” The controversial issue 
is only whether such economic prob- 
lems are a good paradigm of all 
statistical problems. For Bayesians, 
all uncertainties are measured by 
probabilities, and these probabilities 
(along with the here less emphasized 
concept of utilities) are the key to all 
problems of economic uncertainty. 
Such a view deprives debate about 
whether all problems of uncertainty 
are economic of urgency. On the 
other hand, economic definitions of 
personal probability seem, at least to 
us, invaluable for communication and 
perhaps indispensable for operational 
definition of the concept. 

A Bayesian can reflect on his cur- 
rent opinion (and how he should revise 
it on the basis of data) without any 
reference to the actual economic sig- 
nificance, if any, that his opinion may 
have. This paper ignores economic 
considerations, important though they 
are even for pure science, except for 
brief digressions. So doing may com- 
bat the misapprehension that Bayes- 
ian statistics is primarily for business, 
not science. 

Personal probability. With rare 
exceptions, statisticians who conceive 
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of probabilities exclusively as limits of 
relative frequencies are agreed that 
uncertainty about matters of fact is 
ordinarily not measurable by proba- 
bility. Some of them would brand as 
nonsense the probability that weight- 
lessness decreases visual acuity; for 
others the probability of this hy- 
pothesis would be 1 or 0 according as 
it is in fact true or false. Classical 
statistics is characterized by efforts to 
reformulate inference about such hy- 
potheses without reference to their 
probabilities, especially initial proba- 
bilities. 

These efforts have been many and 
ingenious. It is disagreement about 
which of them to espouse, incidentally, 
that distinguishes the two main 
classical schools of statistics. The 
related ideas of significance levels, 
“errors of the first kind,” and con- 
fidence levels, and the conflicting idea 
of fiducial probabilities are all in- 
tended to satisfy the urge to know 
how sure you are after looking at the 
data, while outlawing the question of 
how sure you were before. In our 
opinion, the quest for inference with- 
out initial probabilities has failed, 
inevitably. 

You may be asking, “If a proba- 
bility is not a relative frequency or a 
hypothetical limiting relative fre- 
quency, what is it? If, when I 
evaluate the probability of getting 
heads when flipping a certain coin as 
.5, I do not mean that if the coin were 
flipped very often the relative fre- 
quency of heads to total flips would be 
arbitrarily close to .5, then what do 
I mean?” 

We think you mean something 
about yourself as well as about the 
coin. Would you not say, ‘‘Heads on 
the next flip has probability .5” if and 
only if you would as soon guess heads 
as not, even if there were some im- 
portant reward for being right? If so, 
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your sense of “probability” is ours; 
even if you would not, you begin to 
see from this example what we mean 
by “probability,” or “personal proba- 
bility.” To see how far this notion is 
from relative frequencies, imagine 
being reliably informed that the coin 
has either two heads or two tails. 
You may still find that if you had to 
guess the outcome of the next flip for a 
large prize you would not lift a finger 
to shift your guess from heads to tails 
or vice versa. 

Probabilities other than .5 are 
defined in a similar spirit by one of 
several mutually harmonious devices 
(Savage, 1954, Ch. 1-4). One that is 
particularly vivid and practical, if 
not quite rigorous as stated here, is 
this. For you, now, the probability 
P(A) of an event A is the price you 
would just be willing to pay in ex- 
change for a dollar to be paid to you 
in case A is true. Thus, rain to- 
morrow has probability 1/3 for you 
if you would pay just $.33 now in 
exchange for $1.00 payable to you in 
the event of rain tomorrow. 

A system of personal probabilities, 
or prices for contingent benefits, is 
inconsistent if a person who acts in 
accordance with it can be trapped 
into accepting a combination of bets 
that assures him of a loss no matter 
what happens. Necessary and suffi- 
cient conditions for consistency are 
the following, which are familiar asa 
basis for the whole mathematical 
theory of probability: 


0 = P(A) S$ P(S) =1, 
P(AUB) = P(A) + P(B), 


where S is the tautological, or uni- 
versal, event; A and B are any 
two incompatible, or nonintersecting, 
events; and AUB is the event that 
either A or B is true, or the union of 
A and B. Real people often make 
choices that reflect violations of these 
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rules, especially the second, which is 
why personalists emphasize that per- 
sonal probability is orderly, or con- 
sistent, opinion, rather than just any 
opinion. One of us has presented 
elsewhere a model for probabilities 
inferred from real choices that does 
not include the second consistency 
requirement listed above (Edwards, 
1962b). It is important to keep clear 
the distinction between the some- 
what idealized consistent personal 
probabilities that are the subject of 
this paper and the usually inconsistent 
subjective probabilities that can be 
inferred from real human choices 
among bets, and the words “personal” 
and “subjective” here help do so. 

Your opinions about a coin can of 
course differ from your neighbor's. 
For one thing, you and he may have 
different bodies of relevant informa- 
tion. We doubt that this is the only 
legitimate source of difference of 
opinion. - Hence the personal in per- 
sonal probability. Any probability 
should in principle be indexed with the 
name of the person, or people, whose 
opinion it describes. We usually 
leave the indexing unexpressed but 
underline it from time to time with 
phrases like “the probability for you 
that H is true.” 

Although your initial opinion about 
future behavior of a coin may differ 
radically from your neighbor’s, your 
opinion and his will ordinarily be so 
transformed by application of Bayes’ 
theorem to the results of a long 
sequence of experimental flips as 
to become nearly indistinguishable. 
This approximate merging of initially 
divergent opinions is, we think, one 
reason why empirical research is 
called “objective.” Personal proba- 
bility is sometimes dismissed with the 
assertion that scientific knowledge 
cannot be mere opinion. Yet, obvi- 
ously, no sharp lines separate the 
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conjecture that many human cancers 
may be caused by viruses, the opinion 
that many are caused by smoking, 
and the “knowledge” that many have 
been caused by radiation. 

Conditional probabilities and Bayes’ 
theorem. In the spirit of the rough 
definition of the probability P(A) of 
an event A given above, the condi- 
tional probability P(D|H) of an 
event D given another H is the 
amount you would be willing to pay 
in exchange for a dollar to be paid to 
you in case D is true, with the further 
provision that all transactions are 
canceled unless H is true. As is not 
hard to see, P (DANH) is P (D | H)P (H) 
where DANH is the event that D and 
H are both true, or the intersection 
of D and H. Therefore, 


Pom =". 


unless P (H) = 0. 

Conditional probabilities are the 
probabilistic expression of learning 
from experience. It can be argued 
that the probability of D for you—the 
consistent you—after learning that H 
is in fact true is P(D|H). Thus, 
after you learn that H is true, the new 
system of numbers P(D|H) for a 
specific H comes to play the role that 
was played by the old system P (D) 
before. 

Although the events D and H are 
arbitrary, the initial letters of Data 
and Hypothesis are suggestive names 
for them. Of the three probabilities 
in Equation 1, P(H) might be illus- 
trated by the sentence: ‘‘The proba- 
bility for you, now, that Russia will 
use a booster rocket bigger than our 
planned Saturn booster within the 
next year is .8.” The probability 
P(DQH) is the probability of the 
joint occurrence of two events re- 
garded as one event, for instance: 
“The probability for you, now, that 


[1] 
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the next manned space capsule to 
enter space will contain three men 
and also that Russia will use a booster 
rocket bigger than our planned Saturn 
booster within the next year is .2.” 
According to Equation 1, the proba- 
bility for you, now, that the next 
manned space capsule to enter space 
will contain three men, given that 
Russia will use a booster rocket bigger 
than our planned Saturn booster 
within the next year is .2/.8 = .25. 

A little algebra now leads to a basic 
form of Bayes’ theorem : 


P(D|H)P(H) [2] 
P(D) ace 
provided P(D) and P(H) are not 0. 
In fact, if the roles of D and H in 
Equation 1 are interchanged, the old 
form of Equation 1 and the new form 
can be expressed symmetrically, thus: 


P(D|H) _ PONAM) 
P(D) ~ P(D)P(H) 


_ P(A\D) 
~ P(A)’ 


P(H|D) = 


[3] 


which obviously implies Equation 2. 
A suggestive interpretation of Equa- 
tion 3 is that the relevance of H to D 
equals the relevance of D to H. 
Reformulations of Bayes’ theorem 
apply to continuous parameters or 
data. In particular, if a parameter 
(or set of parameters) \ has a prior 
probability density function (A), and 
if « is a random variable (or a set of 
random variables such as a set of 
measurements) for which v(x |A) is the 
density of x given A and v(x) is the 
density of x, then the posterior 
probability density of A given x is 


v(x|A)u(A) 
AES) an 


There are of course still other possi- 
bilities such as forms of Bayes’ 


u(r|x) = [4] 
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theorem in which \ but not x, or x but 
not A, is continuous. A complete and 
compact generalization is available 
and technically necessary but need 
not be presented here. 

In Equation 2, D may be a par- 
ticular observation or a set of data 
regarded as a datum and H some 
hypothesis, or putative fact. Then 
Equation 2 prescribes the consistent 
revision of your opinions about the 
probability of H in the light of the 
datum D—similarly for Equation 4. 

In typical applications of Bayes’ 
theorem, each of the four probabilities 
in Equation 2 performs a different 
function, as will soon be explained. 
Yet they are very symmetrically re- 
lated to each other, as Equation 3 
brings out, and are all the same kind 
of animal. In particular, all proba- 
bilities are really conditional. Thus, 
P(H) is the probability of the hy- 
pothesis H for you conditional on all 
you know, or knew, about H prior 
to learning D; and P(H|D) is the 
probability of H conditional on that 
same background knowledge together 
with D. 

Again, the four probabilities in 
Equation 2 are personal probabilities. 
This does not of course exclude any 
of them from also being frequencies, 
ratios of favorable to total possibili- 
ties, or numbers arrived at by any 
other calculation that helps you form 
your personal opinions. But some 
are, so to speak, more personal than 
others. In many applications, prac- 
tically all concerned find themselves 
in substantial agreement with respect 
to P(D|H); or P(D|H) is public, as 
we say. This happens when P(D|H) 
flows from some simple model that 
the scientists, or others, concerned 
accept as an approximate description 
of their opinion about the situation in 
which the datum was obtained. A 
traditional example of such a sta- 
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tistical model is that of drawing a ball 
from an urn known to contain some 
balls, each either black or white. If 
a series of balls is drawn from the urn, 
and after each draw the ball is replaced 
and the urn thoroughly shaken, most 
men will agree at least tentatively 
that the probability of drawing a 
particular sequence D (such as black, 
white, black, black) given the hy- 
pothesis that there are B black and 
W white balls in the urn is 


(stw) (sew) 
B+W B+W/’ 
where b is the number of black, and 
w the number of white, balls in the 
sequence D. 

Even the best models have an 
element of approximation. For ex- 
ample, the probability of drawing any 
sequence D of black and white balls 
from an urn of composition H depends, 
in this model, only on the number of 
black balls and white ones in D, not 
on the order in which they appeared. 
This may express your opinion in a 
specific situation very well, but not 
well enough to be retained if D should 
happen to consist of 50 black balls 
followed by 50 white ones. Idio- 
matically, such a datum convinces you 
that this particular model is a wrong 
description of the world. Philo- 
sophically, however, the model was 
not a description of the world but of 
your opinions, and to know that it was 
not quite correct, you had at most to 
reflect on this datum, not necessarily 
to observe it. In many scientific 
contexts, the public model behind 
P(D|H) may include the notions of 
random sampling from a well-defined 
population, as in this example. But 
precise definition of the population 
may be difficult or impossible, and 
a sample whose randomness would 
thoroughly satisfy you, let alone your 
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neighbor in science, can be hard to 
draw. 

In some cases P(D|H) does not 
command general agreement at all. 
What is the probability of the actual 
seasonal color changes on Mars if 
there is life there? What is this 
probability if there is no life there? 
Much discussion of life on Mars has 
not removed these questions from 
debate. 

Public models, then, are never 
perfect and often are not available. 
Nevertheless, those applications of 
inductive inference, or probabilistic 
reasoning, that are called statistical 
seem to be characterized by tentative 
public agreement on some model and 
provisional work within it. Rough 
characterization of statistics by the 
relative publicness of its models is not 
necessarily in conflict with attempts 
to characterize it as the study of 
numerous repetitions (Bartlett, in 
Savage et al., 1962, pp. 36-38). This 
characterization is intended to dis- 
tinguish statistical applications of 
Bayes’ theorem from many other 
applications to scientific, economic, 
military, and other contexts. In some 
of these nonstatistical contexts, it is 
appropriate to substitute the judg- 
ment of experts for a public model as 
the source of P (D | H) (see for example 
Edwards, 1962a, 1963). 

The other probabilities in Equation 
2 are often not at all public. Reason- 
able men may differ about them, even 
if they share a statistical model that 
specifies P(D|H). People do, how- 
ever, often differ much more about 
P(H) and P(D) than about P(H|D), 
for evidence can bring initially diver- 
gent opinions into near agreement. 

The probability P(D) is usually of 
little direct interest, and intuition is 
often silent about it. It is typically 
calculated, or eliminated, as follows. 
When there is a statistical model, H 


is usually regarded as one of a list, or 
partition, of mutually exclusive and 
exhaustive hypotheses H; such that 
the P (D |H;) are all equally public, or 
part of the statistical model. Since 
2;P(H;|D) must be 1, Equation 2 
implies that 


P(D) = 3;P(D|H,)P (Hj). 


The choice of the partition H; is of 
practical importance but largely ar- 
bitrary. For example, tomorrow will 
be “fair” or “foul,’’ but these two 
hypotheses can themselves be sub- 
divided and resubdivided. Equation 
2 is of course true for all partitions 
but is more useful for some than for 
others. As a science advances, parti- 
tions originally not even dreamt of 
become the important ones (Sinclair, 
1960). In principle, room should al- 
ways be left for “some other” ex- 
planation. Since P(D |H) can hardly 
be public when H is “some other 
explanation,” the catchall hypothesis 
is usually handled in part by studying 
the situation conditionally on denial 
of the catchall and in part by informal 
appraisal of whether any of the 
explicit hypotheses fit the facts well 
enough to maintain this denial. Good 
illustrations are Urey (1962) and 
Bridgman (1960). 

In statistical practice, the partition 
is ordinarily continuous, which means 
roughly that H; is replaced by a 
parameter \ (which may have more 
than one dimension) with an initial 
probability density u(\). In this 
case, 


P(D) = f ronwa. 


Similarly, P(D), P(D|H;), and 
P(D|X) are replaced by probability 
densities in D if D is (absolutely) con- 
tinuously distributed. 

P(H|D) or u(d|D), the usual out- 
put of a Bayesian calculation, seems 
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to be exactly the kind of information 
that we all want as a guide to thought 
and action in the light of an observa- 
tional process. It is the probability 
for you that the hypothesis in question 
is true, on the basis of all your in- 
formation, including, but not re- 
stricted to, the observation D. 


PRINCIPLE OF STABLE ESTIMATION 


Problem of prior probabilities. Since 
P(D|H) is often reasonably public 
and P(H|D) is usually just what the 
scientist wants, the reason classical 
statisticians do not base their pro- 
cedures on Equations 2 and 4 must, 
and does, lie in P (H), the prior proba- 
bility of the hypothesis. We have 
already discussed the most frequent 
objection to attaching a probability to 
a hypothesis and have shown briefly 
how the definition of personal proba- 
bility answers that objection. We 
must now examine the practical prob- 
lem of determining P(H). Without 
P(H), Equations 2 and 4 cannot yield 
P(H|D). But since P(H) is a 
personal probability, is it not likely 
to be both vague and variable, and 
subjective to boot, and therefore use- 
less for public scientific purposes? 

Yes, prior probabilities often are 
quite vague and variable, but they 
are not necessarily useless on that 
account (Borel, 1924). The impact 
of actual vagueness and variability of 
prior probabilities differs greatly from 
one problem to another. They fre- 
quently have but negligible effect on 
the conclusions obtained from Bayes’ 
theorem, although utterly unlimited 
vagueness and variability would have 
utterly unlimited effect. If observa- 
tions are precise, in a certain sense, 
relative to the prior distribution on 
which they bear, then the form and 
properties of the prior distribution 
have negligible influence on the pos- 


terior distribution. From a practical 
point of view, then, the untrammeled 
subjectivity of opinion about a pa- 
rameter ceases to apply as soon as 
much data become available. More 
generally, two people with widely 
divergent prior opinions but reason- 
ably open minds will be forced into 
arbitrarily close agreement about 
future observations by a sufficient 
amount of data. An advanced mathe- 
matical expression of this phenomenon 
is in Blackwell and Dubins (1962). 

When prior distributions can be re- 
garded as essentially uniform. Fre- 
quently, the data so completely 
control your posterior opinion that 
there is no practical need to attend to 
the details of your prior opinion. 
For example, consider taking your 
temperature. 

Headachy and hot, you are con- 
vinced that you have a fever but are 
not sure how much. You do not hold 
the interval 100.5°-101° even 20 times 
more probable than the interval 101°- 
101.5° on the basis of your malaise 
alone. But now you take your tem- 
perature with a thermometer that you 
strongly believe to be accurate and 
find yourself willing to give much 
more than 20 to 1 odds in favor of the 
half-degree centered at the ther- 
mometer reading. 

Your prior opinion is rather ir- 
relevant to this useful conclusion but 
of course not utterly irrelevant. For 
readings of 85° or 110°, you would 
revise your statistical model according 
to which the thermometer is accurate 
and correctly used, rather than pro- 
claim a medical miracle. A reading of 
104° would be puzzling—too incon- 
sistent with your prior opinion to 
seem reasonable and yet not obviously 
absurd. You might try again, perhaps 
with another thermometer. 

It has long been known that, under 
suitable circumstances, your actual 
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posterior distribution will be ap- 
proximately what it would have been 
had your prior distribution been 
uniform, that is, described by a 
constant density. As the fever ex- 
ample suggests, prior distributions 
need not be, and never really are, 
completely uniform. To ignore the 
departures from uniformity, it suffices 
that your actual prior density change 
gently in the region favored by the 
data and not itself too strongly favor 
some other region. 

But what is meant by “gently,” by 
“region favored by the data,” by 
“region favored by the prior dis- 
tribution,” and by two distributions 
being approximately the same? Such 
questions do not have ultimate an- 
swers, but this section explores one 
useful set of possibilities. The mathe- 
matics and ideas have been current 
since Laplace, but we do not know 
any reference that would quite sub- 
stitute for the following mathematical 
paragraphs; Jeffreys (1939, see Section 
3.4 of the 1961 edition) and Lindley 
(1961) are pertinent. Those who 
would skip or skim the mathematics 
will find the trail again immediately 
following Implication 7, where the 
applications of stable estimation are 
informally summarized. 


Under some circumstances, the 
posterior probability density 
uale) = EOD Es) 


v(x|d’)u(r’)dn’ 

can be well approximated in some 

senses by the probability density 
v(x| A) 


J o(x|a)an! 


[6] 


where À is a parameter or set of 
parameters, \’ is a corresponding 
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variable of integration, x is an obser- 
vation or set of observations, v(x! A) 
is the probability (or perhaps proba- 
bility density) of x given A, «(A) is the 
prior probability density of A, and the 
integrals are over the entire range of 
meaningful values of à. By their 
nature, “, v, and w are nonnegative, 
and unless the integral in Equation 6 
is finite, there is no hope that the 
approximation will be valid, so these 
conditions are adopted for the fol- 
lowing discussion. 

Consider a region of values of A, 
say B, which is so small that (A) 
varies but little within B and yet so 
large that B promises to contain much 
of the posterior probability of à given 
the value of x fixed throughout the 
present discussion. Let a, 8, y, and ¢ 
be positive numbers, of which the 
first three should in practice be small, 
and are formally taken to be less than 
1. In these terms, three assumptions 
will be made that define one set of 
circumstances under which w(d|x) 
does approximate u(A|x) in certain 
senses, for the given x. 

Assumption 1: 


Í w(A|x)dd < af wÀ |x)dà, 
B B 


where B means, as usual, the com- 
plement of B. (That is, B is highly 
favored by the data; a might be 10 
or less in everyday applications.) 
Assumption 2: For all \«B, 


p Sul) S (1 +8)¢. 


(That is, the prior density changes 

very little within B; .01 or even .05 

would be good everyday values for £. 

The vatue of ø is unimportant and is 

not likely to be accurately known.) 
Assumption 3: 


[yuaina < y f, volea 


è 
LL ae 


(That is, B is also highly favored by 
the posterior distribution; in ap- 
_ plications, y should be small, yet a y 
as large as 100a, or even 1,000a, may 
have to be tolerated.) 
Assumption 3 looks, at first, hard 

to verify without much knowledge of 
u(d). Consider an alternative: 


Assumption 3’: (À) < @¢ for all A, 


where @ is a positive constant. (That 
is, u is nowhere astronomically big 
compared to its nearly constant values 
in B; a 0 as large as 100 or 1,000 will 
often be tolerable.) 

Assumption 3’ in the presence of 
Assumptions 1 and 2 can imply 3, as 
is seen thus. 


fi rowa / f valoa 


= f reuoa/ f, oeud 


TE se f vena / e f rea 
S ba. 


So if y = @a, Assumption 3’ implies 
Assumption 3. 


Seven implications of Assumptions 1, 2, and 
3 are now derived. The first three may be 
viewed mainly as steps toward the later ones. 
The expressions in the large brackets serve 
only to help prove the numbered assertions. 


Implication 1: v(x |d)u (A)dd 
[> [reuwas > e [pema] 

e 
zt fewa. 


Implication oe J v(x|A)u(A)dr 
[+ fe v(e [ayuda + Joc ua 
satn jf, veluoja | 
Bs ROLET J velna. 
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With two new positive constants à and « 
defined by the context, the next implication 
follows easily. 


Implication 3: (1 — 3) = 


(lx) 
T SUAU +a) = (1 +) 


1 
TFU +%) 


for all à in B, except where numerator and 
denominator of w(A|x)/w(\|x) both vanish. 
(Note that if a, 8, and y are small, so are 
sand e) 


Let u(C|x) and w(Clx) denote | «(xia 


and | walan, that is, the probabilities of 
C under the densities # (à |x) and w0 |x). 


Implication 4: «(B|x) & 1 — y, and for 
every subset B, 
u(C|x) 
Pe e aid 


Implication 5: If ¢ is a function of À such 
that |#(A)| < T for all A, then 


| J romaine f taea lad| 


[= f kollna) -walla 
B 


+ fp tO |wrlaen + J; |#0) [wal x)dn 
ull) 
srj- ijwalar + Ta +a] 


< T[max (3, À +y +a} 


Implication 6: |u(C|x) — w(C]x)| 
S max(ô, ©) + y + a for all C. 


It is sometimes important to evaluate 
u(C|x) with fairly good percentage accuracy 
when u(C|x) is small but not nearly so small 
as æ or y, thus. 


Implication 7: (1 — 8)(1 - xan) 
w(CAB\x) — u(COB\x) 
[< a- eo = wC) 
PEGE [scone u(CQB\x) +y 
= aCi ~~ w(Clx) 


< (i+ 


w(CB\x) 


w(C|x) xan 


s EBETA 
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What does all this epsilontics mean 
for practical statistical work? The 
overall goal is valid justification for 
proceeding as though your prior 
distribution were uniform. A set of 
three assumptions implying this justi- 
fication was pointed out: First, some 
region B is highly favored by the data. 
Second, within B the prior density 
changes very little. Third, most of 
the posterior density is concentrated 
inside B. According to a more 
stringent but more easily verified 
substitute for the third assumption, 
the prior density nowhere enormously 
exceeds its general value in B. 

Given the three assumptions, what 
follows? One way of looking at the 
implications is to observe that no- 
where within B, which has high 
posterior probability, is the ratio of 
the approximate posterior density to 
the actual posterior density much 
different from 1 and that what hap- 
pens outside B is not important for 
some purposes. Again, if the posterior 
expectation, or average, of some 
bounded function is of interest, then 
the difference between the expectation 
under the actual posterior distribution 
and under the approximating dis- 
tribution will be small relative to the 
absolute bound of the function. 
Finally, the actual posterior proba- 
bility and the approximate probability 
of any set of parameter values are 
nearly equal. In short, the approxi- 
mation is a good one in several im- 
portant respects—given the three 
assumptions. Still other respects 
must sometimes be invoked and these 
may require further assumptions. 
See, for example, Lindley (1961). 

Even when Assumption 2 is not 
applicable, a transformation of the 
parameters of the prior distribution 
sometimes makes it so. If, for ex- 
ample, your prior distribution roughly 
obeys Weber’s law, so that you tend 
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to assign about as much probability 
to the region from à to 2d as to the 
region from 10) to 20), a logarithmic 
transformation of à may well make 
Assumption 2 applicable for a con- 
siderably smaller 8 than otherwise. 

We must forestall a dangerous con- 
fusion. In the temperature example 
as in many others, the measurement 
x is being used to estimate the value 
of some parameter à. In such cases, 
à and x are measured in the same 
units (degrees Fahrenheit in the ex- 
ample) and interesting values of \ are 
often numerically close to observed 
values of x. It is therefore imperative 
to maintain the conceptual distinction 
between à and x. When the principle 
of stable estimation applies, the 
normalized function v(x|\) as a func- 
tion of A, not of x, approximates your 
posterior distribution. The point is 
perhaps most obvious in an example 
such as estimating the area of a circle 
by measuring its radius. In this case, 
A is in square inches, x is in inches, and 
there is no temptation to think that 
the form of the distribution of «’s is 
the same as the form of the posterior 
distribution of \’s. But the same 
point applies in all cases. The func- 
tion v(x|A) is a function of both x and 
à; only by coincidence will the form 
or the parameters of v(x |A) considered 
as a function of à be the same as its 
form or parameters considered as a 
function of x. One such coincidence 
occurs so often that it tends to mis- 
lead intuition. When your statistical 
model leads you to expect that a set 
of observations will be normally dis- 
tributed, then the posterior distribu- 
tion of the mean of the quantity being 
observed will, if stable estimation ap- 
plies, be normal with the mean equal 
to the mean of the observations. (Of 
course it will have a smaller standard 
deviation than the standard deviation 
of the observations.) 
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102 103 104 


à (DEGREES FAHRENHEIT) 


Fic. 1. (A) and v(x|A) for the fever thermometer example. 


(Note that the units 


on the y axis are different for the two functions.) 


Numerically, what can the principle of 
stable estimation do for the fever-ther- 
mometer example? Figure 1 is a reasonably 
plausible numerical picture of the situation. 
Your prior distribution in your role as invalid 
has a little bump around 98.6°, because on 
other occasions you have taken your tem- 
perature when feeling out of sorts and found 
it depressingly normal. Still, you really think 
you have a fever, so most of your density is 
spread over the region 99.5°-104.5°. It gets 
rather low at the high end of that interval, 
since you doubt that you could have so much 
as a 104° fever without feeling even worse 
than you do. 

The thermometer has a standard deviation 
of .05° and negligible systematic error—this 
is reasonable for a really good clinical ther- 
mometer, the systematic error of which 
should be small compared to the errors of 
procedure and reading. For convenience and 
because it is plausible as an approximation, 
we assume also that the thermometer dis- 


tributes its errors normally. The indicated 
reading will, then, lie within a symmetric 
region .1° wide around the true temperature 
with probability a little less than .7. If the 
thermometer reading is 101.0°, we might take 
the region B to extend from 100.8° to 101.2°— 
four standard deviations on each side of the 
observation. According to tables of the 
normal distribution, œ is then somewhat less 
than 10~. 

The number ¢ should be thought of as the 
smallest value of u (à) within B, but its actual 
value cancels out of all important calculations 
and so is immaterial. For the same reason, 
it is also immaterial that the two functions 
v(101.0]A) and (A) graphed in Figure 1 are 
not measured in the same units and therefore 
cannot meaningfully share the same vertical 
scale; in so drawing them, we sin against 
logic but not against the calculation of u (A |x) 
or w(alx). Figure 1 suggests that 6 is at 
most .05, and we shall work with that value, 
but it is essential to give some serious 
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justification for this crucial assumption, as we 
shall later. 

We justify Assumption 3 by way of As- 
sumption 3’. The figure, drawn for quali- 
tative suggestion rather than accuracy, makes 
a 0 of 2 look reasonable, but since you may 
have a very strong suspicion that your tem- 
perature is nearly normal, we take @ = 100 
for safety. The real test is whether there is 
any hundredth, say, of a degree outside of B 
that you initially held to be more than 100 
times as probable as the initially least 
probable hundredth in B. You will not find 
this question about yourself so hard, espe- 
cially since little accuracy is required. 

Actually, the technique based on @ could 
fail utterly without really spoiling the 
program. Suppose, for example, you really 
think it pretty unlikely that you have a 
fever and have unusually good knowledge of 
the temperature that is normal for you (at 
this hour). You may then have as much 
probability as .95 packed into some interval 
of .1° near normal, but in no such short 
interval in B are you likely to have more than 
one fiftieth of the residual probability. This 
leads to a 8 of at least .95/(.05 X .02) = 950, 
Fortunately, different, but somewhat analo- 
gous, calculations show that even very high 
concentrations of initial probability in a 
region very strongly discredited by the data 
do not interfere with the desired approxima- 
tion. This alternative sort of calculation will 
be made clear by later examples about 
hypothesis testing. 

Returning from the digression, continue 
with @ = 100. The comment after Assump- 
tion 3’ leads to y = ða = 10~ X 10? = .01. 

Explore now some of the consequences of 
the theory of stable estimation for the ex- 
ample: w(\|101.0) is normal about 101° with 
a standard deviation of .05°. If the region B 
is taken to be the interval from 100.8° to 
101.2°, then a = 10-4, B = .05, and y = .01. 
Therefore, ô = 1 — [1 +6) +y) “< .06, 
and e= (1+ 8)(1+a)—1<.051. Ac- 
cording to Implication 4, for any C in B, 
u(C|101.0) differs by at most about 6% from 
the explicitly computable w(C|101.0). For 
any C, whether in B or not, Implication 6 
guarantees |u(C| 101.0) —w(C| 101.0) | <.068. 
An especially interesting example for C is the 
outside of some interval that has, say, 
95% probability under w(d|101.0) so that 

w(C|101.0) = .05. Will w(C|101.0) be mod- 
erately close to 5%? Implications 4 and 
6 do not say so, but Implication 7 says 
that (.94)(.0499) = .0470 < u(C}101.0) 
< (1.050) (.05) + .01 = .0625. This is not so 
crude for the sort of situation where such a 
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u(C|101.0) might be wanted. Even if 
w(C|101.0) is only .01, we get consider- 
able information about u(C|101.0); .0093 
< u(C|101.0) < .021. For w(C|101.0) 
= .001, .000849 < u(C|101.0) S .011. At 
this stage, the upper bound has become al- 
most useless, and when w(C|101.0) is as 
small as 10, the lower bound is utterly 
useless, 

Implication 5, and extensions of it are also 
applicable. If, for example, you record what 
the thermometer says, the mean error and the 
root-mean-squared error of the recorded 
value, averaged according to your own 
opinion, should be about 0° and about .05°, 
respectively, according to a slight extension 
of Implication 5. 

To re-emphasize the central point, those 
details about your initial opinion that were 
not clear to you yourself, about which you 
might not agree with your neighbor, and that 
would have been complicated to keep track 
of anyway can be neglected after a fairly 
good measurement. 

A vital matter that has been postponed is 
to adduce a reasonable value for 8. Like 9, 
6 is an expression of personal opinion. In any 
application, 8 must be large enough to be an 
expression of actual opinion or, in “public” 
applications, of “public” opinion. If your 
opinion were perfectly clear or if the public 
were of one mind, you could determine 8 by 
dividing the maximum of your u(A) in B by 
its minimum and subtracting 1; but the most 
important need for $ arises just when clarity 
or agreement is lacking. For unity of 
discussion, permit us to focus on the problem 
imposed by lack of clarity. 

One way to express the lack of clarity, or 
the vagueness, of an actual set of opinions 
about à is to say that many somewhat 
different densities portray your opinion 
tolerably well. In assuming that .05 was a 
sufficiently large B for the fever example, we 
were assuming that you would reject as un- 
realistic any initial density (A) whose 
maximum in the interval B from 100.8° to 
101.2° exceeds its minimum in B by as much 
as 5%. But how can you know such a thing 
about yourself? Still more, how could you 
hope to guess it about another? 

To begin with, you might consider pairs of 
very short intervals in B and ask how much 
more probable one is than the other, but this 
will fail in realistic problems. To see why it 
fails, ask yourself what odds 2 you would 
offer (initially) for the last hundredth of a 
degree in B against the first hundredth; that 
is, imagine contracting to pay $2 if d is in the 
first hundredth of a degree of B, to receive 
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$1.00 if it is in the last hundredth, and to be 
quits otherwise. If, for instance, you are 
feeling less sick than 101°, then you will be 
clear that u(A) is decreasing throughout B, 
that @ is less than 1, and that 1 — Q would 
be the smallest valid value for 8. However, 
you are likely to be highly confused about @. 
Doubtless @ is very little less than 1. Is .9999 
much too large or .91 much too small? We 
find it hard to answer when the question is 
put thus, and so may you. 

As an entering wedge, consider an interval 
much longer than B, say from 100° to 102°. 
Perhaps you find u(\) to decrease even 
throughout this interval and even to decrease 
moderately perceptibly between its two end 
points. The ratio u(101)/u(102) while dis- 
tinctly greater than 1 may be convincingly 
less than 1.2. If the proportion by which 
u(d) diminished in every hundredth of a 
degree from 100° to 102° were the same— 
more formally, if the logarithmic derivative 
of u(A) were constant between 100° and 102° 
—then (101.2)/u(100.8) would be at most 
(1.2) = (1.2)? = 1.037. Of course the 
rate of decrease is not exactly constant, but 
it may seem sufficiently generous to round 
1.037 up to 1.05, which results in the 6 of .05 
used in this example. Had you taken your 
temperature 25 times (with random error 
but negligible systematic error), which would 
not be realistic in this example but would be 
in some other experimental settings, then the 
standard error of the measurements would 
have been .01, and B would have needed to 
be only .08° instead of .4° wide to take in 
eight standard deviations. Under those 
circumstances, 8 could hardly need to be 
greater than .01, that is, (1.05) °*’* — ri. 


How good should the approxima- 
tion be before you can feel comfortable 
about using it? That depends en- 
tirely on your purpose. There are 
purposes for which an approximation 
of a small probability which is sure to 
be within fivefold of the actual proba- 
bility is adequate. For others, an 
error of 1% would be painful. For- 
tunately, if the approximation is un- 
satisfactory it will often be possible to 
improve it as much as seems necessary 
at the price of collecting additional 
data, an expedient which often justi- 
fies its cost in other ways too. In 
practice, the accuracy of the stable- 
estimation approximation will seldom 
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be so carefully checked as in the fever 
example. As individual and collective 
experience builds up, many applica- 
tions will properly be judged safe at a 
glance. 

Far from always can your prior 
distribution be practically neglected. 
At least five situations in which de- 
tailed properties of the prior distribu- 
tion are crucial occur to us: 

1. If you assign exceedingly small 
prior probabilities to regions of à for 
which v(x|A) is relatively large, you in 
effect express reluctance to believe in 
values of À strongly pointed to by the 
data and thus violate Assumption 3, 
perhaps irreparably. Rare events do 
occur, though rarely, and should not 
be permitted to confound us utterly. 
Also, apparatus and plans can break 
down and produce data that “prove” 
preposterous things. Morals conflict 
in the fable of the Providence man 
who on a cloudy summer day went to 
the post office to return his absurdly 
low-reading new barometer to Aber- 
crombie and Fitch. His house was 
flattened by a hurricane in his absence. 

2. If you have strong prior reason 
to believe that A lies in a region for 
which o(x|\) is very small, you may 
be unwilling to be persuaded by the 
evidence to the contrary, and so again 
may violate Assumption 3. In this 
situation, the prior distribution might 
consist primarily of a very sharp spike, 
whereas v(x|A), though very low in 
the region of the prior spike, may be 
comparatively gentle everywhere. In 
the previous paragraph, it was v(x|d) 
which had the sharp spike, and the 
prior distribution which was near zero 
in the region of that spike. Quite 
often it would be inappropriate to 
discard a good theory on the basis ofa 
single opposing experiment. Hypoth- 
esis testing situations discussed later 
in this paper illustrate this phe- 
nomenon. 
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3. If your prior opinion is relatively 
diffuse, but so are your data, then 
Assumption 1 is seriously violated. 
For when your data really do not 
mean much compared to what you 
already know, then the exact content 
of the initial opinion cannot be 
neglected. 

4. If observations are expensive and 
you have a decision to make, it may 
not pay to collect enough information 
for the principle of stable estimation 
to apply. In such situations you 
should collect just so much informa- 
tion that the expected value of the 
best course of action available in the 
light of the information at hand is 
greater than the expected value of any 
program that involves collecting more 
observations. If you have strong 
prior opinions about the parameter, 
the amount of new information avail- 
able when you stop collecting more 
may well be far too meager to satisfy 
the principle. Often, it will not pay 
you to collect any new information 
at all. 

5. It is sometimes necessary to 
make decisions about sizable research 
commitments such as sample size or 
experimental design while your knowl- 
edge is still vague. In this case, an 
extreme instance of the former one, 
the role of prior opinion is particularly 
conspicuous. As Raiffa and Schlaifer 
(1961) show, this is one of the most 
fruitful applications of Bayesian ideas. 

Whenever you cannot neglect the 
details of your prior distribution, you 
have, in effect, no choice but to 
determine the relevant aspects of it as 
best you can and use them. Almost 
always, you will find your prior 
opinions quite vague, and you may be 
distressed that your scientific infer- 
ence or decision has such a labile 
basis. Perhaps this distress, more 
than anything else, discouraged stat- 
isticians from using Bayesian ideas 
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all along (Pearson, 1962). To para- 
phrase de Finetti (1959, p. 19), people 
noticing difficulties in applying Bayes’ 
theorem remarked “We see that it is 
not secure to build on sand. Take 
away the sand, we shall build on the 
void.” If it were meaningful utterly 
to ignore prior opinion, it might 
presumably sometimes be wise to do 
so; but reflection shows that any 
policy that pretends to ignore prior 
opinion will be acceptable only insofar 
as it is actually justified by prior 
opinion. Some policies recommended 
under the motif of neutrality, or using 
only the facts, may flagrantly violate 
even very confused prior opinions, 
and so be unacceptable. The method 
of stable estimation might casually be 
described as a procedure for ignoring 
prior opinion, since its approximate 
results are acceptable for a wide range 
of prior opinions. Actually, far from 
ignoring prior opinion, stable estima- 
tion exploits certain well-defined fea- 
tures of prior opinion and is acceptable 
only insofar as those features are 
really present. 


A SMATTERING OF BAYESIAN 
DISTRIBUTION THEORY 


The mathematical equipment re- 
quired to turn statistical principles 
into practical procedures, for Bayesian 
as well as for traditional statistics, is 
distribution theory, that is, the theory 
of specific families of probability 
distributions. Bayesian distribution 
theory, concerned with the interrela- 
tion among the three main distribu- 
tions of Bayes’ theorem, is in some 
respects more complicated than classi- 
cal distribution theory. But the 
familiar properties that distributions 
have in traditional statistics, and in 
the theory of probability in general, 
remain unchanged. To a professional 
statistician, the added complication 
requires little more than possibly a 
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shift to a more complicated notation. 
Chapters 7 through 13 of Raiffa and 
Schlaifer'’s (1961) book are an exten- 
sive discussion of distribution theory 
for Bayesian statistics. 

As usual, a consumer need not 
understand in detail the distribution 
theory on which the methods are 
based; the manipulative mathematics 
are being done for him. Yet, like any 
other theory, distribution theory must 
be used with informed discretion. 
The consumer who delegates his 
thinking about the meaning of his 
data to any “powerful new tool” of 
course invites disaster. Cookbooks, 
though indispensable, cannot sub- 
stitute for a thorough understanding 
of cooking; the inevitable appearance 
of cookbooks of Bayesian statistics 
must be contemplated with ambi- 
valence. 

Conjugate distributions. Suppose 
you take your temperature at a 
moment when your prior probability 
density u (A) is not diffuse with respect 
to v(x|\), so your posterior opinion 
u(A|x) is not adequately approxi- 
mated by w(A|x). Determination and 
application of u(A|x) may then re- 
quire laborious numerical integrations 
of arbitrary functions. One way to 
avoid such labor that is often useful 
and available is to use conjugate dis- 
tributions. When a family of prior 
distributions is so related to all the 
conditional distributions which can 
arise in an experiment that the 
posterior distribution is necessarily in 
the same family as the prior distribu- 
tions, the family of prior distributions 
is said to be conjugate to the experi- 
ment. By no means all experiments 
have nontrivial conjugate families, 
but a few ubiquitous kinds do. 
Examples: Beta priors are conjugate 
to observations of a Bernoulli process, 
normal priors are conjugate to ob- 
servations of a normal process with 
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known variance, Several other con- 
jugate pairs are discussed by Raiffa 
and Schlaifer (1961). 

Even when there is a conjugate 
family of prior distributions, your own 
prior distribution could fail to be in 
or even near that family. The dis- 
tributions of such a family are, 
however, often versatile enough to 
accommodate the actual prior opinion, 
especially when it is a bit hazy. 
Furthermore, if stable estimation is 
nearly but not quite justifiable, a 
conjugate prior which approximates 
your true prior even roughly may be 
expected to combine with v(x/A) to 
produce a rather accurate posterior 
distribution. 

Should the fit of members of the 
conjugate family to your true opinion 
be importantly unsatisfactory, realism 
may leave no alternative to something 
as tedious as approximating the con- 
tinuous distribution by a discrete one 
with many steps, and applying Bayes- 
ian logic by brute force. Respect 
for your real opinion as opposed to 
some handy stereotype is essential. 
That is why our discussion of stable 
estimation, even in this expository 
paper, emphasized criteria for decid- 
ing when the details of a prior 
opinion really are negligible. 

An example: Normal measurement 
with variance known. To give a 
minimal illustration of Bayesian dis- 
tribution theory, and especially of 
conjugate families, we discuss briefly, 
and without the straightforward alge- 
braic details, the normally distributed 
measurement of known variance. The 
Bayesian treatment of this problem 
has much in common with its classical 
counterpart. As is well known, it is a 
good approximation to many other 
problems in statistics. In particular, 
it is a good approximation to the case 
of 25 or more normally distributed 
observations of unknown variance, 
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with the observed standard error of 
the mean playing the role of the 
known standard deviation and the 
observed mean playing the role of the 
single observation. In the following 
discussion and throughout the re- 
mainder of the paper, we shall discuss 
the single observation x with known 
standard deviation ø, and shall leave 
it to you to make the appropriate 
translation into the set of n = 25 
observations with mean (= x) and 
standard error of the mean s/ Vn (= o), 
whenever that translation aids your 
intuition or applies more directly to 
the problem you are thinking about. 
Much as in classical statistics, it is 
also possible to take uncertainty 
about ø explicitly into account by 
means of Student’s t See, for ex- 
ample, Chapter 11 of Raiffa and 
Schlaifer (1961). 

Three functions enter into the prob- 
lem of known variance: u(d), v(x|A), 
and u(A|x). The reciprocal of the 
variance appears so often in Bayesian 
calculations that it is convenient to 
denote 1/0? by h and call k the 
precision of the measurement. We 
are therefore dealing with a normal 
measurement with an unknown mean 
p but known precision k. Suppose 
your prior distribution is also normal. 
It has a mean uo and a precision ko, 
both known by introspection. There 
is no necessary relationship between 
ho and h, the precision of the measure- 
ment, but in typical worthwhile 
applications h is substantially greater 
than fo. After an observation has 
been made, you will have a normally 
distributed posteriof opinion, now 
with mean y and precision kı. 


uoho + xh 
HT ho +h 
and 
hy = ho + h. 
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The posterior mean is an average of 
the prior mean and the observation 
weighted by the precisions. The 
precision of the posterior mean is the 
sum of the prior and data precisions. 
The posterior distribution in this case 
is the same as would result from the 
principle of stable estimation if in 
addition to the datum x, with its 
precision k, there had been an addi- 
tional measurement of value yo and 
precision ho. 

If the prior precision fo is very small 
relative to k, the posterior mean will 
probably, and the precision will cer- 
tainly, be nearly equal to the data 
mean and precision; that is an explicit 
illustration of the principle of stable 
estimation. Whether or not that 
principle applies, the posterior preci- 
sion will always be at least the larger 
of the other two precisions; therefore, 
observation cannot but sharpen opin- 
ion here. This conclusion is some- 
what special to the example; in 
general, an observation will occasion- 
ally increase, rather than dispel doubt. 

In applying these formulas, as an 
approximation, to inference based on 
a large number n of observations with 
average & and sample variance s”, x is 
and his n/s*. To illustrate both the 
extent to which the prior distribution 
can be irrelevant and the rapid nar- 
rowing of the posterior distribution as 
the result of a few normal observa- 
tions, consider Figure 2. The top sec- 
tion of the figure shows two prior 
distributions, one with mean —9 and 
standard deviation 6 and the other 
with mean 3 and standard deviation 2. 
The other four sections show posterior 
distributions obtained by applying 
Bayes’ theorem to these two priors 
after samples of size n are taken from 
a distribution with mean 0 and stand- 
ard deviation 2. The samples are 
artificially selected to have exactly the 
mean 0. After 9, and still more after 
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after n normally distributed observations. 


16, observations, these markedly dif- 
ferent prior distributions have led 
to almost indistinguishable posterior 
distributions. 

Of course the prior distribution is 
never irrelevant if the true parameter 
happens to fall in a region to which 
the prior distribution assigns virtually 
zero probability. A prior distribution 
which has a region of zero probability 


is therefore undesirable unless you 
really consider it impossible that the 
true parameter might fall in that 
region. Moral: Keep the mind open, 
or at least ajar. 

Figure 2 also shows the typical 
narrowing of the posterior distribution 
with successive observations. After 4 
observations, the standard deviation 
of your posterior distribution is less 
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than one half the standard deviation 
of a single observation; after 16, less 
than one fourth; and so on. In plan- 
ning experiments, it sometimes seems 
distressing that the standard devia- 
tion decreases only as the square root 
of the number of observations, so a 
threefold improvement by sheer force 
of numbers, if possible at all, costs at 
least a ninefold effort. But subjects 
in unpublished experiments by W. L. 
Hays, L. D. Phillips, and W. Edwards 
are unwilling to change their diffuse 
initial opinions into sharp posterior 
ones, even after exposure to over- 
whelming evidence. This reluctance 
to extract from data as much certainty 
as they permit may be widespread. 
If so, explicit application of Bayes’ 
theorem to information processing 
tasks now performed by unaided hu- 
man judgment may produce more effi- 
cient use of the available information 
(for a proposal along these lines, see 
Edwards, 1962a, 1963). 

When practical interest is focused 
on a few of several unknown pa- 
rameters, the general Bayesian method 
is to find first the posterior joint dis- 
tribution of all the parameters and 
from it to compute the corresponding 
marginal distribution of the param- 
eters of special interest. When, for 
instance, observations are drawn 
from a normal distribution of un- 
known mean y and standard deviation 
a, stable estimation applied to the 
two parameters yu and In ø followed by 
elimination of In ø leads to approxi- 
mation of the posterior distribution 
of pin terms of Student’s ¢ distribution 
with n — 1 degrees of freedom, in 
somewhat accidental harmony with 
classical statistics. (For those who 
have not encountered it before, the 
symbol In stands for natural log- 
arithm, or logarithm to the base e.) 

Frequently, however, too little is 
known about the distribution from 
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which a sequence of observations is 
drawn to express it confidently in 
terms of any moderate number of 
parameters. These are the situations 
that have evoked what is called the 
theory of nonparametric statistics. 
Ironically, a main concern of non- 
parametric statistics is to estimate the 
parameters of unknown distributions. 
The classical literature on nonpara- 
metric statistics is vast; see I. R. 
Savage (1957, 1962) and Walsh 
(1962). Bayesian counterparts of 
some of it are to be expected but are 
not yet achieved. To hint at some 
nonparametric Bayesian 
seems reasonable to estimate tie me- 
dian of a largely unknown distribution 
by the median of the sample, and the 
mean of the distribution by the mean 
of the sample; given the sample, it 
will ordinarily be almost an even- 
money bet that the population median 
exceeds the sample median; and so on. 
Technically, the ‘‘and so on” points 
toward Bayesian justification for the 
classical theory of joint nonparametric 
tolerance intervals. 


POINT AND INTERVAL ESTIMATION 


Measurements are often used to 
make a point estimate, or best guess, 
about some quantity. In the fever- 
thermometer example, you would 
want, and would spontaneously make, 
such an estimate of the true tem- 
perature. What the best estimate is 
depends on what you need an estimate 
for and what penalty you associate 
with various possible errors, but a 
good case can often be made for the 
posterior mean, which minimizes the 
posterior mean squared error. For 
general scientific reporting there seems 
to be no other serious contender (see 
Savage, 1954, pp. 233-234). When 
the principle of stable estimation 
applies, the maximum-likelihood esti- 


ideas, it» 
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mate is often a good approximation 
to the posterior mean. 

Classical statistics has also stressed 
interval, as opposed to point, esti- 
mates. Just what these are used for 
is hard to formulate (Savage, 1954, 
Section 17.2); they are, nonetheless, 
handy in thinking informally about 
specific applications of statistics. The 
Bayesian theory of interval estimation 
is simple. To name an interval that 
you feel 95% certain includes the true 
value of some parameter, simply 
inspect your posterior distribution of 
that parameter; any pair of points 
between which 95% of your posterior 
density lies defines such an interval. 
We call such intervals credible in- 
tervals, to distinguish them from the 
confidence intervals and fiducial in- 
tervals of classical statistics. 

Of course, somewhat as for classical 
interval estimates, there are an un- 
limited number of different credible 
intervals of any specified probability. 
One is centered geometrically on the 
posterior mean; one, generally a 
different one, has equal amounts of 
probability on each side of the pos- 
terior median. Some include nearly 
all, or all, of one tail of the posterior 
distribution; some do not. The 
choice, which is seldom delicate, de- 
pends on the application. One choice 
of possible interest is the shortest 
credible interval of a specified proba- 
bility; for unimodal, bilaterally sym- 
metric posterior distributions, it is 
centered on the posterior mean, and 
median. In the fever example, in 
which an observation with standard 
deviation .05° made the principle of 
stable estimation applicable, the re- 
gion 101° + 1.96s = 101° + .098 is 
the shortest interval containing ap- 
proximately 95% of the posterior 
probability; 100.83° to 101.08° and 
100.92° to œ% are also 95% credible 
intervals, though asymmetric ones. 
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In certain examples like this one, the 
smallest credible interval of a specified 
credibility corresponds closely to the 
most popular of the classical con- 
fidence intervals having confidence 
level equal to that credibility, But 
in general credible intervals will differ 
from confidence intervals. 


INTRODUCTION TO HYPOTHESIS 
TESTING 


No aspect of classical statistics has 
been so popular with psychologists 
and other scientists as hypothesis 
testing, though some classical stat- 
isticians agree with us that the topic 
has been overemphasized. A stat- 
istician of great experience told us, 
“I don't know much about tests, 
because I have never had occasion to 
use one.” Our devotion of most of the 
rest of this paper to tests would be 
disproportionate, if we were not 
writing for an audience accustomed 
to think of statistics largely as testing. 

So many ideas have accreted to the 
word “test” that one definition cannot 
even hint at them. We shall first 
mention some of the main ideas 
relatively briefly, then flesh them out 
a bit with informal discussion of hy- 
pothetical substantive examples, and 
finally discuss technically some typical 
formal examples from a Bayesian 
point of view. Some experience with 
classical ideas of testing is assumed 
throughout. The pinnacle of the 
abstract theory of testing from the 
Neyman-Pearson standpoint is Leh- 
mann (1959). Laboratory thinking 
on testing may derive more from R. A. 
Fisher than from the Neyman-Pear- 
son school, though very few are 
explicitly familiar with Fisher’s ideas 
culminating in 1950 and 1956. 

The most popular notion of a test is, 
roughly, a tentative decision between 
two hypotheses on the basis of data, 
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and this is the notion that will domi- 
nate the present treatment of tests. 
Some qualification is needed if only 
because, in typical applications, one 
of the hypotheses—the null hypoth- 
esis—is known by all concerned to be 
false from the outset (Berkson, 1938; 
Hodges & Lehmann, 1954; Lehmann, 
1959; I. R. Savage, 1957; L. J. Savage, 
1954, p. 254); some ways of resolving 
the seeming absurdity will later be 
pointed out, and at least one of them 
will be important for us here. 

The Neyman-Pearson school of 
theoreticians, with their emphasis on 
the decision-theoretic or behavioral 
approach, tend to define a test as a 
choice between two actions, such as 
whether or not to air condition the 
ivory tower so the rats housed therein 
will behave more consistently. This 
definition is intended to clarify opera- 
tionally the meaning of decision 
between two hypotheses. For one 
thing, as Bayesians agree, such a 
decision resembles a potential dichot- 
omous choice in some economic 
situation such as a bet. Again, wher- 
ever there is a dichotomous economic 
choice, the possible values of the un- 
known parameters divide themselves 
into those for which one action or the 
other is appropriate. (The neutral 
zone in which both actions are equally 
appropriate is seldom important and 
can be dealt with in various ways.) 
Thus a dichotomous choice corre- 
sponds to a partition into two hy- 
potheses. Nonetheless, not every 
choice is like a simple bet, for economic 
differences within each hypothesis can 
be important. 

Sometimes the decision-theoretic 
definition of testing is expressed as a 
decision to act as though one or the 
other of the two hypotheses were 
believed, and that has apparently led 
to some confusion (Neyman, 1957, 
p. 16). What action is wise of course 
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depends in part on what is at stake. 
You would not take the plane if you 
believed it would crash, and would not 
buy flight insurance if you believed 
it would not. Seldom must you 
choose between exactly two acts, one 
appropriate to the null hypothesis 
and the other to its alternative. 
Many intermediate, or hedging, acts 
are ordinarily possible; flying after 
buying flight insurance, and choosing 
a reasonable amount of flight insur- 
ance, are examples. 

From a Bayesian point of view, the 
special role of testing tends to evap- 
orate, yet something does remain. 
Deciding between two hypotheses in 
the light of the datum suggests to a 
Bayesian only computing their pos- 
terior probabilities; that a pair of 
probabilities are singled out for special 
attention is without theoretical in- 
terest. Similarly, a choice between 
two actions reduces to choosing the 
larger of two expected utilities under a 
posterior distribution. The feature of 
importance for the Bayesian was 
practically lost in the recapitulation 
of general classical definitions. This 
happened, in part, because the feature 
would seem incidental in a general 
classical theory though recognized by 
all as important in specific cases and, 
in part, because expression of the 
feature is uncongenial to classical 
language, though implicitly recog- 
nized by classical statisticians. 

In many problems, the prior density 
u(A) of the parameter(s) is often 
gentle enough relative to v(x|A) to 
permit stable estimation (or some 
convenient variation of it). One 
important way in which (A) can fail 
to be sufficiently gentle is by concen- 
trating considerable probability close 
to some point (or line, or surface, or 
the like). Certain practical devices 
can render the treatment of such a 
concentration of probability relatively 
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public. These devices are, or should 
be, only rather rarely needed, but 
they do seem to be of some importance 
and to constitute appropriate Bayes- 
ian treatment of some of the scientific 
situations in which the classical theory 
of hypothesis testing has been in- 
voked. At least occasionally, a pair 
of hypotheses is associated with the 
concentration of probability. For 
example, if the squirrel has not 
touched it, that acorn is almost sure 
to be practically where it was placed 
yesterday. For vividness and to 
maintain some parallelism with classi- 
cal expressions, we shall usually 
suppose concentration associated with 
a null hypothesis, as in this example; 
it is straightforward to extend the 
discussion to situations where there is 
not really such a pair of hypotheses. 
The theory of testing in the sense of 
dealing with concentrated probability 
as presented here draws heavily on 
Jeffreys (1939, see Ch. 5 and 6 of the 
1961 edition) and Lindley (1961). 

Examples. Discussion of a few 
examples may bring out some points 
associated with the various concepts 
of testing. 

Example 1. Two teaching-machine 
programs for sixth-grade arithmetic 
have been compared experimentally. 

For some purposes each program 
might be characterized by a single 
number, perhaps the mean difference 
between pretest and posttest perform- 
ance on some standardized test of 
proficiency in arithmetic. This num- 
ber, an index of the effectiveness of 
the program, must of course be com- 
bined with economic and other in- 
formation from outside the experiment 
itself if the experiment is to guide 
some practical decision. 

If one of the two programs must be 
adopted, the problem is one of testing 
in the sense of the general decision- 
theoretic definition, yet it is likely 
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to be such that practicing statisticians 
would not ordinarily call the appro- 
priate procedure a test at all. Unless 
your prior opinion perceptibly favored 
one of the two programs, you should 
plainly adopt that one which seemed, 
however slightly, to do better in the 
experiment. The classical counter- 
part of this simple conclusion had to 
be discovered against the tendency to 
invoke “significance tests” in all 
testing situations (Bahadur & Rob- 
bins, 1950). 

But suppose one program is much 
more expensive to implement than 
the other. If such information about 
costs is available, it can be combined 
with information provided by the 
experiment to indicate how much 
proficiency can be bought for how 
many dollars. It is then a matter of 
judgment whether to make the pur- 
chase. In principle the judgment is 
simply one of the dollar value of 
proficiency (or equivalently of the 
proficiency value of dollars); in prac- 
tice, such judgments are often difficult 
and controversial. 

If the experiment is indecisive, 
should any decision be risked? Of 
course it should be if it really must be. 
In many actual situations there are 
alternatives such as further experi- 
mentation. The choice is then really 
at least trichotomous but perhaps 
with dichotomous emphasis on con- 
tinuing, as opposed to desisting from, 
experimentation. Such suggestions as 
to continue only if the difference is not 
significant at, say, the 5% level are 
sometimes heard. Many classical 
theorists are dissatisfied with this 
approach, and we believe Bayesian 
statistics can do better (see Raiffa & 
Schlaifer, 1961, for some progress in 
this direction). 

Convention asks, ‘Do these two 
programs differ at all in effectiveness ?”’ 
Of course they do. Could any real 
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difference in the programs fail to 
induce at least some slight difference 
in their effectiveness? Yet the differ- 
ence in effectiveness may be negligible 
compared to the sensitivity of the 
experiment. In this way, the con- 
ventional question can be given 
meaning, and we shall often ask it 
without further explanation or apol- 
ogy. A closely related question would 
be, “Is the superiority of Method A 
over Method B pointed to by the 
experiment real, taking due account 
of the possibility that the actual 
difference may be very small?” With 
several programs, the number of 
questions about relative superiority 
rapidly multiplies. 

Example 2. Can this subject guess 
the color of a card drawn from a 
hidden shuffled bridge deck more or 
less than 50% of the time? 

This is an instance of the conven- 
tional question, “Is there any differ- 
ence at all?” so philosophically the 
answer is presumably ‘‘yes,”’ though 
in the last analysis the very meaning- 
fulness of the question might be 
challenged. We would not expect any 
such ostensible effect to stand up from 
one experiment to another in magni- 
tude or direction. We are strongly 
prejudiced that the inevitable small 
deviations from the null hypothesis 
will always turn out to be somehow 
artifactual—explicable, for instance, 
in terms of defects in the shuffling or 
concealing of the cards or the record- 
ing of the data and not due to Extra- 
Sensory Perception (ESP). 

One who is so prejudiced has no 
need for a testing procedure, but 
there are examples in which the null 
hypothesis, very sharply interpreted, 
commands some but not utter cre- 
dence. The present example is such 
a one for many, more open minded 
about ESP than we, and even we can 


W. Epwarps, H. LINDMAN, AND L. J. SAVAGE 


imagine, though we do not expect, 
phenomena that would shake our 
disbelief. 

Example 3. Does this packed suit- 
case weigh less than 40 pounds? The 
reason you want to know is that the 
airlines by arbitrary convention charge 
overweight for more. The conven- 
tional weight, 40 pounds, plays little 
special role in the structure of your 
opinion which may well be diffuse 
relative to the bathroom scale. If the 
scale happens to register very close to 
40 pounds (and you know its preci- 
sion), the theory of stable estimation 
will yield a definite probability that 
the suitcase is overweight. If the 
reading is not close, you will have 
overwhelming conviction, one way or 
the other, but the odds will be very 
vaguely defined. For the conditions 
are ill suited to stable estimation if 
only because the statistical model of 
the scale is not sufficiently credible. 

If the problem is whether to leave 
something behind or to put in another 
book, the odds are not a sufficient 
guide. Taking the problem seriously, 
you would have to reckon the cash 
cost of each amount of overweight 
and the cash equivalent to you of 
leaving various things behind in order 
to compute the posterior expected 
worth of various possible courses of 
action. 

We shall discuss further the applica- 
tion of stable estimation to this ex- 
ample, for this is the one encounter 
we shall have with a Bayesian 
procedure at all harmonious with a 
classical tail-area significance test. 
Assume, then, that a normally dis- 
tributed observation x has been made, 
with known standard deviation , and 
that your prior opinion about the 
weight of your suitcase is diffuse 
relative to the measurement. The 
principle of stable estimation applies, 
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so, as an acceptable approximation, 


x 2) am, 


in case |t| is not too great. In words, 
the probability that your suitcase 
weighs at most 40 pounds, in the light 
of the datum x, is the probability to 
the left of t under the standard normal 
distribution. Almost by accident, 
this is also the one-tailed significance 
level of the classical ¢ test for the 
hypothesis that A < 40. The funda- 
mental interpretation of (t) here is 
the probability for you that your 
suitcase weighs less than 40 pounds; 
just the sort of thing that classical 
statistics rightly warns us not to 
expect a significance level to be. 
Problems in which stable estimation 
leads exactly to a one-tailed classical 
significance level are of very special 
structure. No Bayesian procedure 
yet known looks like a two-tailed test 
(Schlaifer, 1961, p. 212). 

Classical one-tailed tests are often 
recommended for a situation in which 
Bayesian treatment would call for 
nothing like them. Imagine, for 
instance, an experiment to determine 
whether schizophrenia impairs prob- 
lem solving ability, supposing it all 
but inconceivable that schizophrenia 
enhances the ability. This is classi- 
cally a place to use a one-tailed test; 
the Bayesian recommendations for 
this problem, which will not be ex- 
plored here, would not be tail-area 
tests and would be rather similar to 
the Bayesian null hypothesis tests 
discussed later. One point recognized 
by almost all is that if schizophrenia 
can do no good it must then do some 
harm, though perhaps too little to 
perceive, 

Before putting the suitcase on the 
bathroom scales you have little ex- 
pectation of applying the formal 
arithmetic of the preceding para- 


PA < 40|x) = a( 
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graphs. At that time, your opinion 
about the weight of the suitcase is 
diffuse. Therefore, no interval as 
small as 6 or 8 ¢ can include much of 
your initial probability. On the other 
hand, if {¢| is greater than 3 or 4, 
which you very much expect, you 
will not rely on normal tail-area 
computations, because that would put 
the assumption of normality to un- 
reasonable strain. Also Assumption 2 
of the discussion of stable estimation 
will probably be drastically violated. 
You will usually be content in such a 
case to conclude that the weight of the 
suitcase is, beyond practical doubt, 
more (or less) than 40 pounds. 

The preceding paragraph illustrates 
a procedure that statisticians of all 
schools find important but elusive. 
It has been called the interocular 
traumatic test;? you know what the 
data mean when the conclusion hits 
you between the eyes. The inter- 
ocular traumatic test is simple, com- 
mands general agreement, and is often 
applicable; well-conducted experi- 
ments often come out that way. But 
the enthusiast’s interocular trauma 
may be the skeptic’s random error. 
A little arithmetic to verify the extent 
of the trauma can yield great peace of 
mind for little cost. 
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Odds and likelihood ratios. Gam- 
blers frequently measure probabilities 
in terms of odds. Your odds in favor 
of the event A are (aside from utility 
effects) the amount that you would 
just be willing to pay if A does not 
occur in compensation for a commit- 
ment from someone else to pay you 
one unit of money if A does occur. 
The odds Q (4) in favor of A are thus 
related to the probability P(A) of A 


2 J. Berkson, personal communication, July 
14, 1958. 
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and the probability 1 — P(A) of not 
A, or A, by the condition, 

O(A)(1 — P(A)] = P(A). 


Odds and probability are therefore 
translated into each other thus, 


P(A) P(A). 
Q(A) = 1- P(A) = P(A)’ 
2(A) 
P(A) = TF DA 


For example, odds of 1, an even-money 
bet, correspond to a probability of 
1/2; a probability of 9/10 corresponds 
to odds of 9 (or 9 to 1), and a proba- 
bility of 1/10 corresponds to odds of 
1/9 (or 1 to 9). If P(A) is 0, Q(A) is 
plainly 0; and if P(A) is 1, Q(4) may 
be called ©, if it need be defined at all. 
From a Bayesian standpoint, part 
of what is suggested by “testing” is 
finding the posterior probability 
P(A|D) of the hypothesis A in the 
light of the datum D, or equivalently, 
finding the posterior odds 2(A |D). 


According to Bayes’ theorem 


Palp = ZPA, o) 
P(A|D) -72 ARD, [8] 


Dividing each side of Equation 7 by 
the corresponding side of Equation 8, 
canceling the common denominators 
P(D), and making evident abbrevia- 
tions leads to a condensation of 
Equations 7 and 8 in terms of odds; 


P(D|A) 
P(D|A) 

= L(A; D)Q(A). [9] 
In words, the posterior odds in favor 
of A given the datum D are the prior 


odds multiplied by the ratio of the 
conditional probabilities of the datum 


2(A|D) = 


Q(A) 
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given the hypothesis A and given its 
negation, The ratio of conditional 
probabilities L(A; D) is called the 
likelihood ratio in favor of the hy- 
pothesis A on the basis of the datum D. 

Plainly, and according to Equation 
9, D increases the odds for A, if and 
only if D is more probable under A 
than under its negation A so that 
L(A; D) is greater than 1. _ 

If D is impossible under A, Equa- 
tion 9 requires an illegitimate division, 
but it can fairly be interpreted to say 
that A has acquired probability 1 
unless Q(4) = 0, in which case the 
problem is ill specified. With that 
rather academic exception, whenever 
2(A) is 0 so is Q(A |D); roughly, once 
something is regarded as impossible, 
no evidence can reinstate its credi- 
bility. 

In actual practice, L(A; D) and 
Q(A) tend to differ from person to 
person. Nonetheless, statistics is 
particularly interested in examining 
how and when Equation 9 can lead to 
relatively public conclusions, a theme 
that will occupy several sections. 


Simple dichotomy. It is useful, at least for 
exposition, to consider problems in which 
L(A;D) is entirely public. For example, 
someone whose word you and we trust might 
tell us that the die he hands us produces 6's 
either (A) with frequency 1/6 or (A) with 
frequency 1/5. Your initial opinion 2(A) 
might differ radically from ours. But, for you 
and for us, the likelihood ratio in favor of A 
on the basis of a 6 is (1/6)/(1/5) or 5/6, and 
the likelihood ratio in favor of A on the basis 
of a non-6 is (5/6)/(4/5) or 25/24. Thus, if 
a 6 appears when the die is rolled, everyone's 
confidence in A will diminish slightly; 
specifically, odds in favor of A will be 
diminished by 5/6. Similarly, a non-6 will 
augment 2(A) by the factor 25/24. 

If such a die could be rolled only once, the 
resulting evidence L(A; D) would be negli- 
gible for almost any purpose; if it can be 
rolled many times, the evidence is ultimately 
sure to become definitive. As is implicit in 
the concept of the not necessarily fair die, 
if Dı, Da, Da,» + -are the outcomes of successive 
rolls, then the same function L(A ; D) applies 
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tosach. Therefore Equation 9 can be applied 
repeatedly, thas: 


Q(A|D,) = L(A; Data) 
O(A| Dy, Di) = L(A; DYAL [D:) 
= L(A; Dy) L(A; DOALA) 


Q(A|D,,-++, Di) = L(A; DJAA Danie 
Dya**, Di) 


= L(A; D)L(A; Daa) +> 
L(A; Data) 
= 1}. L(A; DJAA). 


This multiplicative composition of likelihood 
ratios exemplifies an important general 
principle about observations which are in- 
dependent given the hypothesis, 

For the specific example of the die, if x 
6's and y non-6's occur (where of course 
x+y =n), then 


AA |Date Dy) = (5): ( 3 Yaw. 


For large n, if A obtains, it is highly probable 
at the outset that x/n will fall close to 1/6. 
Similarly, if A does not obtain x/n will 
probably fall close to 1/5. Thus, if A obtains, 
the overall likelihood (5/6)*(25/24)" will 
probably be very roughly 


COMORO 


By the time n is 1,200 everyone's odds in favor 
of A will probably be augmented about a 
hundredfold, if A is in fact true. One who 
started very ical of A, say with Q(A) 
about a thousandth, will still be rather 
skeptical. But he would have to start from a 
very skeptical position indeed not to become 
strongly convinced when » is 6,300 and the 
overall likelihood ratio in favor of A is about 
10 billion. A 
The arithmetic for A is: 


LOGT RUS 


So the rate at which evidence accumulates 
against A, and for A, when A is true is in this 
case a trifle more than the rate at which it 
accumulates for A when A is true. 

Simple dichotomy is instructive for sta- 
tistical theory generally but must be taken 
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with a grain of salt. For simple dichotomies 
—that is, applications of Equation 9 in which 
everyoue concerned will agree aad be clear 
about the values of L(A; D)-—rarely, if ever, 
occur in scientific practice. Public models 
almost always involve parameters rather than 
finite partitions. 

Some generalizations are apparent in what 
has already been said about simple dichotomy. 


At a given moment, let us su; 5 
have to guess whether it is A or 
obtains and you will receive $/ if you guess 


The expected cash value to you of guessing 
is $IP(A) and that of guessing A is $/P 
You will therefore prefer to A it 
only if $IP(A) exceeds $/P(A); that is, 
if Q(A) exceeds J/I. (More rigorous treat- 
ment would replace dollars with utiles.) 


you will prefer 
Q(A|D) exceeds J/I. Putting thi 
with Equation 9, you will prefer 
if, and only if, 


L(A; D) > = 


ia ~* 
where your critical likelihood ratio A is 
defined by the context. 


cians were the first to conclude that there 
must be some A such that will guess A if 
L(A;D) >A and guess A if L(A; D) < A. 
(For this sketch, it is excusable to neglect the 
possibility that A = L(A; D).) By and large, 
classical statisticians say that the choice of A 
is an entirely subjective one which no one but 
you can make (e.g., Lehmann, 1959, p. 62). 
Bayesians agree; for according to Equation 9, 
A is inversely proportional to your current 
odds for A, an aspect of your personal opinion. 

The classical statisticians, however, have 
overlooked a great simplification, namely that 
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your critical A will not depend on the size or 
structure of the experiment and will be 
proportional to J/I. Once the Bayesian 
position is accepted, Equation 9 is of course 
an argument for this simplification, but it can 
also be arrived at along a classical path, which 
in effect derives much, if not all, of Bayesian 
statistics as a natural completion of the 
classical decision-theoretic position. This 
relation between the two views, which in no 
way depends on the artificiality of simple 
dichotomy here used to illustrate it, cannot 
be overemphasized. (For a general demon- 
stration, see Raiffa & Schlaifer, 1961, pp. 
24-27.) 

The simplification is brought out by the set 
of indifference curves among the various 
probabilities of the two kinds of errors (Leh- 
mann, 1958). Of course, any reduction of the 
probability of one kind of error is desirable 
if it does not increase the probability of the 
other kind of error, and the implications of 
classical statistics leave the description of the 
indifference curves at that. But the con- 
siderations discussed easily imply that the 
indifference curves should be parallel straight 
lines with slope — [J/IQ(A)]. As Savage 
(1962b) puts it: 


the subjectivist’s position is more objective 
than the objectivist’s, for the subjectivist 
finds the range of coherent or reasonable 
preference patterns much narrower than the 
objectivist thought it to be. How confusing 
and dangerous big words are [p. 67]! 


Classical statistics tends to divert attention 
from A to the two conditional probabilities of 
making errors, by guessing A when A obtains 
and vice versa. The counterpart of the 
probabilities of these two kinds of errors in 
more general problems is called the operating 
characteristic, and classical statisticians sug- 
gest, in effect, that you should choose among 
the available operating characteristics as a 
method of choosing A, or more generally, 
your prior distribution. This is not mathe- 
matically wrong, but it distracts attention 
from your value judgments and opinions 
about the unknown facts upon which your 
preferred A should directly depend without 
regard to how the probabilities of errors vary 
with A in a specific experiment. 

There are important advantages to recog- 
nizing that your A does not depend on the 
structure of the experiment. It will help you, 
for example, to choose between possible 
experimental plans. It leads immediately to 
the very important likelihood principle, which 
in this application says that the numerical 
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value of the likelihood ratio of the datum 

conveys the entire import of the datum. (A 

later section is about the likelihood principle.) 
Wolfowitz (1962) dissents. 


Approaches to null hypothesis testing. 
Next we examine situations in which 
a very sharp, or null, hypothesis is 
compared with a rather flat or diffuse 
alternative hypothesis. This short 
section indicates general strategies of 
such comparisons. None of the com- 
putations or conclusions depend on 
assumptions about the special initial 
credibility of the null hypothesis, but 
a Bayesian will find such computa- 
tions uninteresting unless a non- 
negligible amount of his prior proba- 
bility is concentrated very near the 
null hypothesis value. 

For the continuous cases to be 
considered in following sections, the 
hypothesis A is that some parameter A 
is in a set that might as well also be 
called A. For one-dimensional cases 
in which the hypothesis A is that A is 
almost surely negligibly far from some 
specified value Ao, the odds in favor 
of A given the datum D, as in 
Equation 9, are 


P(A|D) 
P(A|D) 

v(D|o) 
fonala 


= L(A; D)Q(A). 


0(A|D) = 


(A) 


Natural generalizations apply to 
multidimensional cases. The num- 
erator v(D|Xo) will in usual applica- 
tions be public. But the denominator, 
the probability of D under the alter- 
native hypothesis, depends on the 
usually far from public prior density 
under the alternative hypothesis. 
Nonetheless, there are some relatively 
public methods of appraising the 
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denominator, and much of the follow- 
ing discussion of tests is, in effect, 
about such methods. Their spirit is 
opportunistic, bringing to bear what- 
ever approximations and bounds offer 
themselves in particular cases. The 
main ideas of these methods are 
sketched in the following three para- 
graphs, which will later be much 
amplified by examples. 

First, the principle of stable estima- 
tion may apply to the datum and to 
the density u(A|A) of à given the 
alternative hypothesis A. In this 
case, the likelihood ratio reflects no 
characteristics of u(A|A) other than 
its value in the neighborhood favored 
by the datum, a number that can be 
made relatively accessible to intro- 
spection. 

Second, it is relatively easy, in any 
given case, to determine how small the 
likelihood ratio can possibly be made 
by utterly unrestricted and artificial 
choice of the function «(A|A). If 
this rigorous public lower bound on 
the likelihood ratio is not very small, 
then there exists no system of prior 
probabilities under which the datum 
greatly detracts from the credibility 
of the null hypothesis. Remarkably, 
this smallest possible bound is by no 
means always very small in those 
cases when the datum would lead to a 
high classical significance level such 
as .05 or .01. Less extreme (and 
therefore larger) lower bounds that do 
assume some restriction on u(\|A) 
are sometimes appropriate; analogous 
restrictions also lead to upper bounds. 
When these are small, the datum does 
rather publicly greatly lower the 
credibility of the null hypothesis. 
Analysis to support an interocular 
traumatic impression might often be 
of this sort. Inequalities stated more 
generally by Hildreth (1963) are 
behind most of these lower and upper 
bounds. 
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Finally, when »(D|,) admits of a 
conjugate family of distributions, it 
may be useful, as an approximation, 
to suppose u(A|A) restricted to the 
conjugate family. Such a restriction 
may help fix reasonably public bounds 
to the likelihood ratio. 

We shall see that classical pro- 
cedures are often ready severely to 
reject the null hypothesis on the basis 
of data that do not greatly detract 
from its credibility, which dramat- 
ically demonstrates the practical dif- 
ference between Bayesian and classical 
statistics. This finding is not alto- 
gether new. In particular, Lindley 
(1957) has proved that for any 
classical significance level for rejecting 
the null hypothesis (no matter how 
small) and for any likelihood ratio in 
favor of the null hypothesis (no 
matter how large), there exists a 
datum significant at that level and 
with that likelihood ratio. 

To prepare intuition for later tech- 
nical discussion we now show in- 
formally, as much as possible from a 
classical point of view, how evidence 
that leads to classical rejection of a 
null hypothesis at the .05 level can 
favor that null hypothesis. The loose 
and intuitive argument can easily be 
made precise (and is, later in the 
paper). Consider a two-tailed ¢ test 
with many degrees of freedom. If a 
true null hypothesis is being tested, 
t will exceed 1.96 with probability 
2.5% and will exceed 2.58 with 
probability .5%. (Of course, 1.96 and 
2.58 are the 5% and 1% two-tailed 
significance levels; the other 2.5% and 
5% refer to the possibility that £ may 
be smaller than —1.96 or —2.58.) 
So on 2% of all occasions when true 
null hypotheses are being tested, ¢ 
will lie between 1.96 and 2.58. How 
often will ¢ lie in that interval when 
the null hypothesis is false? That 
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depends on what alternatives to the 
null hypothesis are to be considered. 
Frequently, given that the null hy- 
pothesis is false, all values of ¢ be- 
tween, say, —20 and +20 are about 
equally likely for you. Thus, when 
the null hypothesis is false, # may well 
fall in the range from 1.96 to 2.58 with 
at most the probability (2.58 — 1.96)/ 
[+20 — (—20)] = 1.55%. In such 
a case, since 1.55 is less than 2 the 
occurrence of ¢ in that interval speaks 
mildly for, not vigorously against, the 
truth of the null hypothesis. 

This argument, like almost all the 
following discussion of null hypothesis 
testing, hinges on assumptions about 
the prior distribution under the alter- 
native hypothesis. The classical stat- 
istician usually neglects that dis- 
tribution—in fact, denies its existence. 
He considers how unlikely a ¢ as far 
from 0 as 1.96 is if the null hypothesis 
is true, but he does not consider that 
a t as close to 0 as 1.96 may be even 
less likely if the null hypothesis is 
false. 

A Bernoullian example. To begin 
a more detailed examination of Bayes- 
ian methods for evaluating null hy- 
potheses, consider this example: 

We are studying a motor skills 
task. Starting from a neutral rest 
position, a subject attempts to touch 
a stylus as near as possible to a long, 
straight line. We are interested in 
whether his responses favor the right 
or the left of the line. Perhaps from 
casual experience with such tasks, we 
give special credence to the possibility 
that his long-run frequency p of 
“rights” is practically po = 1/2. The 
problem is here posed in the more 
familiar frequentistic terminology ; its 
Bayesian translation, due to de Fin- 
etti, is sketched in Section 3.7 of 
Savage (1954). The following discus- 
sion applies to any fraction po as 


well as to the specific value 1/2. 
Under the null hypothesis, your 
density of the parameter p is sharply 
concentrated near po, while your 
density of p under the alternative 
hypothesis is not concentrated and 
may be rather diffuse over much of 
the interval from 0 to 1. 

If n trials are undertaken, the 
probability of obtaining 7 rights given 
that the true frequency is p is of 
course C#p"(1 — p)". The proba- 
bility of obtaining r under the null 
hypothesis that p is literally po is 
Cipo (1 — po)". Under the alter- 
native hypothesis, it is 


1 
i) Cip (i — py u(p| H)dp, 


that is, the probability of r given p 
averaged over p, with each value in 
the average weighted by its prior 
density under the alternative hy- 
pothesis. The likelihood ratio is 
therefore 


foe 
_ pwa- 
J: 
J P — py u(p| Hdp 


. [10] 


The disappearance of C} from the 
likelihood ratio by cancellation is 
related to the likelihood principle, 
which will be discussed later. Had 
the experiment not been analyzed 
with a certain misplaced sophistica- 
tion, C? would never have appeared 
in the first place. We would simply 
have noted that the probability of 
any specific sequence of rights and 
lefts with r rights and n — r lefts is, 
given p, exactly p(1 — p)”. That 
the number of different sequences of 
this composition is C? is simply irrel- 
evant to Bayesian inference about fP. 

One possible way to reduce the 
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denominator of Equation 10 to more 
tractable form is to apply the prin- 
ciple of stable estimation, or more 
accurately certain variants of it, to 
the denominator. To begin with, if 
u(p| Hı) were a constant w’, then the 
denominator would be 


1 
[| va = pulpl Hap 


1 
= wf pr — p)"dp 


u' 

(n + 1)CP° [11] 
The first equality is evident; the 
second is a known formula, enchant- 
ingly demonstrated by Bayes (1763). 
Of course u cannot really be a con- 
stant unless it is 1, but if r and n — r 
are both fairly large p”(1 — p)" is a 
sharply peaked function with its 
maximum at r/n. If u(p|H:) is 
gentle near r/n and not too wild 
elsewhere, Equation 11 may be a 
satisfactory approximation, with w’ 
= u(r/n| Hı). This condition is often 
met, and it can be considerably 
weakened without changing the con- 
clusion, as will be explained next. 


If the graph of u(p|Hı) were a straight, 


though not necessarily horizontal, line then 
the required integral would be 


[fe — pulp lHdp 


me) 


(n + 1)CP 


[12] 


This is basically a standard formula like the 
latter part of Equation 11, and is in fact 
rather easily inferred from that earlier formula 
itself. Consequently, for large r and » — r, 
Equation 12 can be justified as an approxima- 
tion with u’ = u[ (r + 1)/ (n + 2)| Hi] when- 
ever u(p|H;) is nearly linear in the neighbor- 
hood of (r + 1)/( +2), which under the 
assumed conditions is virtually indistinguish- 
able from r/n. 


In summary, it is often suitable to 
approximate the likelihood ratio thus: 


L(po; 7, n) 


wi es à at's 
col E CREL i po) 
OF DPCM O45 


where u’ = u(r/n| Hy) or ul (r + 1)/ 
(n + 2)| H]. 

Does this approximation apply to 
you in a specific case? If so, what 
value of wu’ is appropriate? Such 
subjective questions can be answered 
only by self-interrogation along lines 
suggested by our discussion of stable 
estimation. In particular, x’ is closely 
akin to the ¢ of our Condition 2 for 
stable estimation. In stable estima- 
tion, the value of ¢ cancels out of all 
calculations, but here, w’ is essential. 
One way to arrive at w’ is to ask your- 
self what probability you attach to a 
small, but not microscopic, interval 
of values of p near r/n under the 
alternative hypothesis. Your reply 
will typically be vague, perhaps just 
a rough order of magnitude, but that 
may be enough to settle whether the 
experiment has strikingly confirmed 
or strikingly discredited the null 
hypothesis. 


In principle, any positive value of w’ can 
arise, but values between .1 and 10 promise 
to predominate in practice. The reasons for 
this are complex and not altogether clear to 
us, but something instructive can be said 
about them here. To begin with, since the 
integral of u(p|Hi) is 1, u(p|Hı) cannot 
exceed 10 throughout an interval as long as 
1/10. Therefore, if u(r/n| H1) is much greater 
than 10, u (p| Hı) must undergo great diminu- 
tion quite close to r/n, and the approximation 
will not be applicable unless v(r|p, n) is very 
violent indeed, which can happen only if r 
and n — r are very large, perhaps several 
thousands. 

Typically, u(p|Hı) attains its maximum at 
po, or at any rate is rather substantial near 
there—its maximum is necessarily at least 1, 


224 


because its integral from 0 to 1 is 1. There- 
fore, should the null hypothesis obtain, 
u(r/n|Hy) is most unlikely to be as small as 
1/10. Under the alternative hypothesis, you 
must, according to a simple mathematical 
argument, attach probability less than 1/10 
to the set of those values of p for which 
u(p|Hı) is less than 1/10. Under a reason- 
ably diffuse alternative hypothesis, the prob- 
ability of an r for which u(r/n| Hı) is at most 
1/10 is much the same as the probability of 
a p for which u (p| H1) is at most 1/10. Thus, 
under either hypothesis, you are unlikely to 
encounter an r for which u(r/n| Hi) < 1/10. 
You are actually much more unlikely yet to 
encounter such an r for which the approxima- 
tion is applicable. 


In this particular example of a 
person aiming at a line with a stylus, 
structuring your opinion in terms of a 
sharp null hypothesis and a diffuse 
alternative is rather forced. More 
realistically, your prior opinion is 
simply expressed by a density with a 
rather sharp peak, or mode, at 
po =1/2, and your posterior dis- 
tribution will tend to have two modes, 
one at po and the other about at r/n. 
Nonetheless, an arbitrary structuring 
of the prior density as a weighted 
average, or probability mixture, of 
two densities, one practically concen- 
trated at po and the other somewhat 
diffuse, may be a useful approach. 

Conversely, even if the division js 
not artificial, the unified approach is 
always permissible. This may help 
emphasize that determining the pos- 
terior odds is seldom the entire aim 
of the analysis. The posterior dis- 
tribution of p under the alternative 
hypothesis is also important. This 
density u(p|r, n, Hı) is determined by 
Bayes’ theorem from the datum (r, n) 
and the alternative prior density 
u(p| Hi); for this, what the hypothesis 
Ho is, or how probable you consider 
it either before or after the experiment 
are all irrelevant. As in any other 
estimation problem, the principle 
of stable estimation may provide 
an adequate approximation for 
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u(p|r, n, Hı). Ifin addition, the null 
hypothesis is strongly discredited by 
the datum, then the entire posterior 
density u(p|r,n) will be virtually 
unimodal and identifiable with 
u(p|r, n, Hı) for many purposes. In 
fact, the outcome of the test in this 
case is to show that stable estimation 
(in particular our Assumption 3) is 
applicable without recourse to As- 
sumption 3’. 

The stable-estimation density for 
this Bernoullian problem is of course 
pr(1 — p)" multiplied by the ap- 
propriate normalizing constant, which 
is implicit in the second equality of 
Equation 11. This is an instance of 
the beta density of indices a and b, 


POOR e b-1 

Pate e aie 
In this case, a = r + 1 and b 
=(n—r)+1. 

In view of the rough rule of thumb 
that u’ is of the order of magnitude 
of 1, the factor (n + 1)P(r| po, n) is 
at least a crude approximation to 
L(po;7, n) and is of interest in any 
case as the relatively public factor 
in L(po; r, n) and hence in Q(Ho|r, n). 
The first three rows of Table 1 show 
hypothetical data for four different 
experiments of this sort (two of them 
on a large scale) along with the 
corresponding likelihood ratios for the 
uniform alternative prior. The num- 
bers in Table 1 are, for illustration, 
those that would, for the specified 
number of observations, barely lead 
to rejection of the null hypothesis, 
p = .5, by a classical two-tailed test 
at the .05 level. 

How would a Bayesian feel about 
the numbers in Table 1? Remember 
that a likelihood ratio greater than 1 
leaves one more confident of the null 
hypothesis than he was to start with, 
while a likelihood ratio less than 1 
leaves him less confident of it than 
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TABLE 1 


LIKELIHOOD RATIOS UNDER THE UNIFORM ALTERNATIVE PRIOR AND MINIMUM 
LIKELIHOOD RATIOS FOR VARIOUS VALUES OF # AND FOR VALUES 


OF r JusT SIGNIFICANT AT THE .05 LEVEL 


1 2 


n 50 100 
r 32 60 
L (po; r, n) 8178 1.092 
Daia 1372 1335 


Experiment number 


3 4 ~ 


400 10,000 (very large) 


220 5,098 (n + 1.96 Vn)/2 
2.167 11.689 11689 Vn 


1349 1465 -1465 


he was to start with. Thus Experi- 
ment 1, which argues against the null 
hypothesis more persuasively than the 
others, discredits it by little more than 
a factor of 1.27 to 1 (assuming 
u' = 1) instead of the 20 to 1 which 
a naive interpretation of the .05 level 
might (contrary to classical as well as 
Bayesian theory) lead one to expect. 
More important, Experiments 3 and 
4, which would lead a classical stat- 
istician to reject the null hypothesis, 
leave the Bayesian who happens to 
have a roughly uniform prior, more 
confident of the null hypothesis than 
he was to start with. And Experiment 
4 should reassure even a rather 
skeptical person about the truth of 
the null nypothesigiliFiere, then, is a 
blunt practical contradiction between 
conclusions produced by classical and 
Bayesian rules for statistical inference. 
Though the Bernoullian example is 
special, particularly in that it offers 
relatively general grounds for w’ to be 
about 1, classical procedures quite 
typically are, from a Bayesian point 
of view, far too ready to reject 
null hypotheses. 

Approximation in the spirit of 
stable estimation is by no means the 
last word on evaluating a likelihood 
ratio. Sometimes, as when 7 or n — Y 
are too small, it is not applicable at 
all, and even when it might otherwise 
be applicable, subjective haze and 


interpersonal disagreement affecting 
u' may frustrate its application. The 
principal alternative devices known 
to us will be at least mentioned in 
connection with the present example, 
and most of them will be explored 
somewhat more in connection with 
later examples. 

It is but an exercise in differential 
calculus to see that p’(1 — p)" at- 
tains its maximum at p = r/n. 
Therefore, regardless of what u(p) 
actually is, the likelihood ratio in 
favor of the null hypothesis is at least 


pel — po)” 


“Bey 


If this number is not very small, then 
everyone (who does not altogether 
reject Bayesian ideas) must agree that 
the null hypothesis has not been 
greatly discredited. For example, 
since Lmin in Table 1 exceeds .05, it is 
impossible for the experiments con- 
sidered there that rejection at the 5% 
significance level should ever cor- 
respond to a nineteenfold diminution 
of the odds in favor of the null 
hypothesis. It is mathematically 
possible but realistically preposterous 
for Lmin to be the actual likelihood 
ratio. That could occur only if your 
u(p|Hx) were concentrated at r/n, 
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and prior views are seldom so pre- 
scient. 


It is often possible to name, whether for 
yourself alone or for “the public,” a number 
u* that is a generous upper bound for u (p| Hi), 
that is, a u* of which you are quite confident 
that u(p| Hi) < u* for all p (in the interval 
from 0 to 1). A calculation much like 
Equations 11 and 13 shows that if u* is 
substituted for «’ in Equation 13, the re- 
sultant fraction is less than the actual likeli- 
hood ratio. If this method of finding a lower 
bound for L is not as secure as that of the 
preceding paragraph, it generally provides a 
better, that is, a bigger, one. The two 
methods can be blended into one which is 
always somewhat better than either, as will 
be illustrated in a later example. 

Upper, as well as lower, bounds for L are 
important. One way to obtain one is to 
paraphrase the method of the preceding 
paragraph with a lower bound rather than an 
upper bound for u(p). This method will 
seldom be applicable as stated, since u(p) is 
likely to be very small for some values of p, 
especially values near 0 or 1. But refinements 
of the method, illustrated in later examples, 
may be applicable. 

Another avenue, in case u(p|H;) is known 
with even moderate precision but is not gentle 
enough for the techniques of stable estima- 
tion, is to approximate u(p|H1) by the beta 
density for some suitable indices a and b. 
This may be possible since the two adjustable 
indices of the beta distribution provide con- 
siderable latitude and since what is required 
of the approximation is rather limited. It 
may be desirable, because beta densities are 
conjugate to Bernoullian experiments. In 
fact, if u(p|Hı) is a beta distribution with 
indices a and b, then u(p|r, n, Hı) is also 
a beta density, with indices a + r and 
b + (n — r). The likelihood ratio in this 
case is 


(a — 1)! b — 1)! (n +a +b — 1)! 
(a+r—1)!+n-—r—1)!@+b-1)! 
X po" (1 — po)". 
These facts are easy consequences of the 
definite integral on which Equation 11 is 


based. More details will be found in Chapter 
9 of Raiffa and Schlaifer (1961), 


A one-dimensional normal example. 
We examine next one situation in 
which classical statistics prescribes a 
two-tailed ¢ test. As in our discussion 
of normal measurements in the section 
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on distribution theory, we will con- 
sider one normally distributed ob- 
servation with known variance; as 
before, this embraces by approxima- 
tion the case of 25 or more observa- 
tions of unknown variance and many 
other applications such as the Ber- 
noullian experiments. 


According to Weber’s Law, the ratio of the 
just noticeable difference between two sensory 
magnitudes to the magnitude at which the 
just noticeable difference is measured is a 
constant, called the Weber fraction. The law 
is approximately true for frequency dis- 
crimination of fairly loud pure tones, say 
between 2,000 and 5,000 cps; the Weber 
fraction is about .0020 over this fairly wide 
range of frequencies. Psychophysicists dis- 
agree about the nature and extent of inter- 
action between different sense modalities. 
You might, therefore, wonder whether there 
is any difference between the Weber fraction 
at 3,000 cps for subjects in a lighted room and 
in complete darkness. Since search for such 
interactions among modalities has failed more 
often than it has succeeded, you might give 
considerable initial credence to the null hy- 
pothesis that there will be no (appreciable) 
difference between the Weber fractions ob- 
tained in light and in darkness. However, 
such effects might possibly be substantial. If 
they are, light could facilitate or could hinder 
frequency discrimination. Some work on 
arousal might lead you to expect facilitation ; 
the idea of visual stimuli competing with 
auditory stimuli for attention might lead you 
to expect hindrance. If the null hypothesis is 
false, you might consider any value between 
-0010 and .0030 of the Weber fraction ob- 
tained in darkness to be roughly as plausible 
as any other value in that range. Your 
instruments and procedure permit determina- 
tion of the Weber fraction with a standard 
deviation of 3.33 X 10-5 (a standard devia- 
tion of .1 cps at 3,000 cps, which is not too 
implausible if your procedures permit re- 
peated measurements and are in other ways 
extremely accurate). Thus the range of 
plausible values is 60 standard deviations 
wide—quite large compared with similar 
numbers in other parts of experimental 
psychology, though small compared with 
many analogous numbers in physics oF 
chemistry. Such a small standard deviation 
relative to the range of plausible values is not 
indispensable to the example, but it 1s 
convenient and helps make the example 
congenial to both physical and social scien- 
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tists. If the standard deviation were more 
than 10~, however, the eventual application 
of the principle of stable estimation to the 
example would be rather difficult to justify. 

A full Bayesian analysis of this problem 
would take into account that each observation 
consists of two Weber fractions, rather than 
one difference between them. However, as 
classical statistics is even too ready to agree, 
little if any error will result from treating the 
difference between each Weber fraction de- 
termined in light and the corresponding Weber 
fraction determined in darkness as a single 
observation. In that formulation, the null 
hypothesis is that the true difference is 0, 
and the alternative hypothesis envisages the 
true difference as probably between —.0010 
and -+.0010. The standard deviation of the 
measurement of the difference, if the measure- 
ments in light and darkness are independent, 
is 1.414 X 3.33 X 105 = 4.71 X 10-®. Since 
our real concern is exclusively with differences 
between Weber fractions and the standard 
deviation of these differences, it is convenient 
to measure every difference between Weber 
fractions in standard deviations, that is to 
multiply it by 21,200 (= 1/c). In these new 
units, the plausible range of observations is 
about from —21 to +21, and the standard 
deviation of the differences is 1. The rest of 
the discussion of this example is based on 
these numbers alone. 


The example specified by the last 
two paragraphs has a sharp null hy- 
pothesis and a rather diffuse sym- 
metric alternative hypothesis with 
good reasons for associating sub- 
stantial prior probability with each. 
Although realistically the null hy- 
pothesis cannot be infinitely sharp, 
calculating as though it were is an 
excellent approximation. Realism, 
and even mathematical consistency, 
demands far more sternly that the 
alternative hypothesis not be utterly 
diffuse (that is, uniform from — © to 
+ œ); otherwise, no measurement of 
the kind contemplated could result in 
any opinion other than certainty that 
the null hypothesis is correct. 

Having already assumed that the 
distribution of the true parameter or 
parameters under the null hypothesis 
is narrow enough to be treated as 
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though it were concentrated at the 
single point 0, we also assume that the 
distribution of the datum given the 
parameter is normal with moderate 
variance. By moderate we mean 
large relative to the sharp null hy- 
pothesis but (in most cases) small 
relative to the distribution under the 
alternative hypothesis of the true 
parameter. 

Paralleling our treatment of the 
Bernoullian example, we shall begin, 
after a neutral formulation, with an 
approximation akin to stable estima- 
tion, then explore bounds on the 
likelihood ratio L that depend on far 
less stringent assumptions, and finally 
explore normal prior distributions. 

Without specifying the form of the 
prior distribution under the alter- 
native hypothesis, the likelihood ratio 
in the Weber-fraction example under 
discussion is 


L (Xo; *) 
if OES ~») 
o £ o 


k fie = BY wala 


The numerator is the density of the 
datum x under the null hypothesis; o 
is the standard deviation of the 
measuring instrument. The denomi- 
nator is the density of « under the 
alternative hypothesis. The values of 
à are the possible values of the actual 
difference under the alternative hy- 
pothesis, and Xo is the null value, 0. 
¢gL(x — )/o] is the ordinate of the 
standard normal density at the point 
(x —2)/c. Hereafter, we will use 
the familiar statistical abbreviation 
t= («—0)/o for the ¢ of the 
classical ¢ test. Finally, u (A| Hı) is 
the prior probability density of À 
under the alternative hypothesis. 

If u (à| H) is gentle in the neighbor- 
hood of x and not too violent else- 


[14] 
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where, a reasonable approximation to 
Equation 14, akin to the principle of 
stable estimation, is 


KAO] 


Lido; x)= ouly 


[15] 


According to a slight variation of the 
principle, already used in the Bernoul- 
lian example, near linearity may 
justify this approximation even better 
than near constancy does. Since ¢ is 
measured in the same units as x or À, 
say, degrees centigrade or cycles per 
second, and u(x) is probability per 
degree centigrade or per cycle per 
second, the product eu(x) (in the 
denominator of Equation 15) is di- 
mensionless. Visualizing ou(x) as a 
rectangle of base g, centered at x, and 
height u(x), we see ou(x) to be 
approximately your prior probability 
for an interval of length o in the 
region most favored by the data. 


Consider an example. If \. = Oande = 1, 
then an observation of 2.58 would be signifi- 
cantly different from the null hypothesis at 
the .01 level of a classical two-tailed ¢ test. 
If your alternative density were uniform over 
the range —21 to +21, then its average 
height would be about .024. But it is not 
uniform, and it is presumably somewhat 
higher near 0 than it is farther away. Perhaps 
under the alternative hypothesis, you would 
distinctly not attach more than 1/20 prior 
probability to any region one unit wide, and 
do attach about that much prior probability 
to such intervals in the immediate vicinity 
of the null value. According to the table of 
normal ordinates, ¢(2.58) = .0143, so the 
likelihood ratio is about .286. Thus for the 
Bayesian, as for the classical statistician, the 
evidence here tells against the null hypothesis, 
but the Bayesian is not nearly so strongly 
persuaded as the classical statistician appears 
to be. The datum 1.96 is just significant at 
the .05 level of a two-tailed test. But the 
likelihood ratio is 1.17. This datum, which 
leads to a .05 classical rejection, leaves the 
Bayesian, with the prior opinion postulated, 
a shade more confident of the null hypothesis 
than he was to start with. The overreadiness 
of classical procedures to reject null hy- 
potheses, first illustrated in the Bernoullian 
example, is seen again here; indeed, the two 
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examples are really much the same in almost 
all respects. This sort of calculation, in- 
cidentally, is a more rigorous equivalent of the 
intuitive argument given just before the 
discussion of the Bernoullian example. 

Lower bounds on L. An alternative 
when «(Aj H,) is not diffuse enough to 
justify stable estimation is to seek 
bounds on L. Imagine all the density 
under the alternative hypothesis con- 
centrated at x, the place most favored 
by the data. The likelihood ratio is 
then 4 

e(t) 42 
Dts P (0) € . 

This is of course the very smallest 
likelihood ratio that can be associated 
with ¢. Since the alternative hy- 
pothesis now has all its density on one 
side of the null hypothesis, it is 
perhaps appropriate to compare the 
outcome of this procedure with the 
outcome of a one-tailed rather than a 
two-tailed classical test. At the one- 
tailed classical .05, .01, and .001 
points, Lmin is .26, .066, and 0085, 
respectively. Even the utmost gen- 
erosity to the alternative hypothesis 
cannot make the evidence in favor of 
it as strong as classical significance 
levels might suggest. Incidentally, 
the situation is little different for a 
two-tailed classical test and a prior 
distribution for the alternative hy- 
pothesis concentrated symmetrically 
at a pair of points straddling the null 
value. If the prior distribution under 
the alternative hypothesis is required 
to be not only symmetric around the 
null value but also unimodal, which 
seems very safe for many problems, 
then the results are too similar tO 
those obtained later for the smallest 
possible likelihood ratio obtainable 
with a symmetrical normal prior 
density to merit separate presentation 
here. 

If you know that your prior density u (à |H) 
never exceeds some upper bound “*, you can 
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improve, that is, increase, the crude lower 
bound Lam The prior distribution most 
favorable to the alternative hypothesis, given 
that it nowhere exceeds «°, is a rectangular 
distribution of height «* with x as ite mid- 


point. Therefore 
LOs; x) 
e e(t) 
"HESTES 


= Laia [16] 


where ® is the standard normal cumulative 
function. Not only is this lower bound better 
than Leis, no matter how large w*, it also 
improves with decreasing ø, as is realistic. 
The improvement over Leia is negligible if 
eu’ z 0.7. 

Either directly or by recognizing that the 
square bracket in Inequality 16 is less than 1, 
it is casy to derive a cruder but simpler bound, 
which is sometimes better than Lein 


(t) 
Lax) 2S (17) 
A counterpart of this more elementary bound 
was exhibited in the Bernoullian example. 
When ou* is less than about .2, the square 
bracket in Inequality 16 is negligibly different 
from 1, so Inequality 16 reduces to Inequality 
17. 

In the present example, perhaps assignment 
of a probability as high as .1 to any interval 
as short as one standard deviation, given that 
light does materially affect frequency dis- 
crimination, may be distinctly contrary to 
your actual opinion. If so, you are entitled to 
apply Inequality 16 (and of course also 
Inequality 17) with u* = .1 and ø = 1. The 
minimal likelihood ratios obtained from In- 
equality 16 (with ou* = .1) corresponding to 
values of ¢ just significant at the .05, .01, and 
.001 levels by classical two-tailed tests are 
.58, .14, and .018, respectively. These bounds, 
though still not high, are considerably higher 
than Lmin- 


Upper bounds on L. In order to 
discredit a null hypothesis, it is useful 
to find a practical upper bound on the 
likelihood ratio L, which can result in 
the conclusion that Z is very small. 
It is impossible that «(À| Hı) should 
exceed some positive number for all 
à, but you may well know plainly 
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that #(A| Hi) & “e > 0 for all X in 
some interval, say of length 4, cen- 
tered at x. In this case, 


Le; x) 


e) 
x- À 
- 
a LA 


) rala 


CE TOW a 
ou,[#(2)—#(—2) ] 


1.05" — 0.42 e" 
V20 ou, ou, 


as 0.42 Leis 
AoT. 


If, for example, you attach as much 
probability as .01 to the intervals of 
length ¢ near x, your likelihood ratio 
is at most 42 Lain. 

For t's classically significant at the 
.05, .01, and .001 levels, your likeli- 
hood ratio is correspondingly at most 
10.9, 2.8, and .36. This procedure 
can discredit null hypotheses quite 
strongly; t's of 4 and 5 lead to upper 
bounds on your likelihood ratio of 
.014 and .00016, insofar as the normal 
model can be taken seriously for such 
large t's. 

Normal alternative priors. Since 
normal densities are conjugate to 
normal measurements, it is natural 
to study the assumption that uA |H) 
is a normal density. This assumption 
may frequently be adequate as an 
approximation, and its relative mathe- 
matical simplicity paves the way to 
valuable insights that may later be 
substantiated with less arbitrary as- 
sumptions. In this paper we explore 
not all normal alternative priors but 
only those symmetrical about Ào, 
which seem especially important. 


Let u (à| H1), then, be normal with mean Ao 
and with some standard deviation 7. Equa- 
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with classical tests consider that (positive) 
value f, of ¢ for which L is 1. If L is 1, then 
the posterior odds for the two hypotheses will 
equal the prior odds; the experiment will 
leave opinion about Ho and H, unchanged, 
though it is bound to influence opinion about 


à given Hı. Taking natural logarithms of 
-20 : [18] Equation 19 for t = fo, 
ay (at) veel 
where Jn a ED (1 — e)? = 0, 
nae ae tea n=-{ ESF co] 
Vere NFS 


Plainly, a is a function of «/7 and vice versa; 
for small values of either, the difference be- 
tween a and o/r is negligible. We emphasize 
a rather than the intuitively more appealing 
a/r because a leads to simpler equations. Of 
course, a is less than one, typically much less. 
Writing the normal density in explicit form, 


Le 2 exp —4(1 — a). [19] 


Table 2 shows numerical values of L (æ, t) for 
some instructive values of a and for values of t 
corresponding to familiar two-tailed classical 
significance levels. The values of a between 
-01 and .1 portray reasonably precise experi- 
ments; the others included in Table 2 are 
instructive as extreme possibilities. Table 2 
again illustrates how classically significant 
values of t can, in realistic cases, be based on 
data that actually favor the null hypothesis. 


If æ is small, say less than .1, then 1 — a 
is negligibly different from 1, and so tə 
œ V—Ina. The effect of using this ap- 
proximation can never be very bad; for the 
likelihood ratio actually associated with the 
approximate value of tọ cannot be less than 1 
or greater than 1.202. Table 3 presents a few 
actual values of f and their corresponding 
two-tailed significance levels. At values of t 
slightly smaller than the break-even values 
in Table 3 classical statistics more or less 
vigorously rejects the null hypothesis, though 
the Bayesian described by œ becomes more 
confident of it than he was to start with. 

If ¢ = 0, that is, if the observation happens 
to point exactly to the null hypothesis, 


1 $ 
Sit thus support for the null hypothesis 


can be very strong, since a might well be about 
01. In the example, you perhaps hope to 
confirm the null hypothesis to everyone's 


For another comparison of Equation 18 satisfaction, if it is in fact true. You will 


TABLE 2 


VALUES OF L(a, t) FOR SELECTED VALUES OF a AND FOR VALUES OF £ 
CORRESPONDING TO FAMILIAR TWO-TAILED SIGNIFICANCE LEVELS 


t and Significance level 
a ofr ois On| tnt. ein i ee 
1.645 1.960 2.576 3.291 3.891 
10 05 01 001 .0001 

0001 -0001 2,585 1,465 362 44.6 5.16 
001 0010 259 147 36.2 4.46 516 
01 -0100 25.9 14.7 3.63 446 0516 
025 0250 10.4 5.87 1.45 179 0207 
05 0501 5.19 2.94 731 0903 0105 
075 0752 3.47 1.97 492 0612 00718 
At 1005 2.62 1.49 375 0470 00556 
A5 1517 1.78 1.02 -260 0336 00408 
2 -2041 1,36 791 -207 0277 00349 
5 5774 725 474 -166 0345 .00685 
9 2.0647 859 771 592 397 264 
99 7.0179 -983 972 946 -907 869 
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TABLE 3 


VALUES OF fp AND THEIR SIGNIFICANCE 
LEVELS FOR NORMAL ALTERNATIVE 
PRIOR DISTRIBUTIONS FOR 
SELECTED VALUES OF a 


r t Significance 
$ level 


ti 2.157 031 


.05 2.451 014 

.01 3.035 0024 
001 3.718 00020 
0001 4.292 .000018 


therefore try hard to make o small enough so 
that your own a and those of your critics will 
be small. In the Weber-fraction example, 
aœ .077 (calculated by assuming that 90% 
of the prior probability under the alternative 
hypothesis falls between —21 and +21; 
assuming normality, it follows that 7 > 12.9). 
If t = 0, then L is 12.9—persuasive but not 
irresistible evidence in favor of the null 
hypothesis. For æ = .077, t is 2.3—just 
about the .02 level of a classical two-tailed 
test. Conclusion: An experiment strong 
enough to lend strong support to the null 
hypothesis when ¢ = 0 will mildly support the 
null hypothesis even when classical tests 
would strongly reject it. 

If you are seriously interested in supporting 
the null hypothesis if it is ttue—and you may 
well be, valid aphorisms about the perish- 
ability of hypotheses notwithstanding—you 
should so design your experiment that even a 
t as large as 2 or 3 strongly confirms the null 
hypothesis, If æ is .0001, L is more than 100 
for any t between —3 and +3. Such small 
a’s do not occur every day, but they are 
possible. Maxwell’s prediction of the equality 
of the “two speeds of light” might be an 
example. A more practical way to prove a 
null hypothesis may be to investigate several, 
not just one of its numerical consequences. 
It is not clear just what sort of evidence 
classical statistics would regard as strong 
confirmation of a null hypothesis. (See 
however Berkson, 1942.) 

What is the smallest likelihood ratio Lnormin 
(the minimum L for a symmetrical normal 
prior) that can be attained for a given t by 
artificial choice of a? It follows from Equa- 
tion 19 that L is minimized at a = |t|, 
provided || =1, and at the unattainable 
value a = 1, otherwise. ` 


Laormin = A |t]? = 1.65 |t|e#? for y zA 
si: 


With any symmetric normal prior, any 
|t| $1 speaks for the null hypothesis. So 
Laormia exceeds Lmia in all cases and exceeds 
it by the substantial factor 1.65 |¢] if |¢| = 1. 
Values of £ corresponding to familiar two- 
tailed significance levels and the correspond- 
ing values of Lpormin are shown in Table 4. 


From this examination of one- 
dimensional normally distributed ob- 
servations, we conclude that a £ of 2 
or 3 may not be evidence against the 
null hypothesis at all, and seldom if 
ever justifies much new confidence in 
the alternative hypothesis. This con- 
clusion has a melancholy side. The 
justification for the assumption of 
normal measurements must in the 
last analysis be empirical. Few 
applications are likely to justify 
using numerical values of normal 
ordinates more than three standard 
deviations away from the mean. And 
yet without those numerical values, 
the methods of this section are not 
applicable. In short, in one-dimen- 
sional normal cases, evidence that 
does not justify rejection of the null 
hypothesis by the interocular trau- 
matic test is unlikely to justify firm 
rejection at all. 

Haunts of x2 and F. Classical tests 
of null hypotheses invoking the x’, and 
closely related F, distributions are so 
familiar that something must be said 
here about their Bayesian counter- 
parts. Though often deceptively 
oversimplified, the branches of sta- 
tistics that come together here are 


TABLE 4 


VALUES OF Lnormin AND OF Lmin FOR VALUES 
OF £ CORRESPONDING TO FAMILIAR TWO- 
‘TAILED SIGNIFICANCE LEVELS 
——— 


t ES Lnormin Lmin 
1.960 .05 AT3 146 
2.576 01 154 0362 
3.291 .001 0241 00445 
3.891 0001 00331 000516 


=1 for |¢| 
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immense and still full of fundamental 
mysteries for Bayesians and classicists 
alike (Fisher, 1925, see Ch. 4 and 5 in 
the 1954 edition; Green & Tukey, 
1960; Scheffé, 1959; Tukey, 1962). 
We must therefore confine ourselves 
to the barest suggestions. 

Much of the subject can be reduced 
to testing whether several parameters 
à; measured independently with 
known variance o? have a specified 
common value. This multidimen- 
sional extension of the one-dimensional 
normal problem treated in the last 
section is so important that we shall 
return to it shortly. 


As is well known, the statistical theory of 
multidimensional normal measurement em- 
braces in a grand generalization that of 
normal regression and Model I analysis of 
variance (and covariance); a host of other 
topics can more or less faithfully be reduced 
to it by approximation (Cramér, 1946, Ch. 
29; Fisher, 1925, see Ch. 5 in the 1954 edition; 
Raiffa & Schlaifer, 1961). 

Approximation of multinomial by multi- 
dimensional normal measurements has also 
been the main approach to that large domain 
which classically evokes x? tests of association 
and goodness of fit (Cramér, 1946, Ch. 30; 
Fisher, 1925, see Ch. 4 in the 1954 edition; 
Jeffreys, 1939, see Section 4.1 in the 1961 edi- 
tion). We shall not attempt to enter into this 
topic here, but the suitably prepared reader 
will find the approximation, and the references 
just cited, helpful. 

One prominent classical application of the 
F distributions is testing whether two vari- 
ances of normally distributed measurements 
are equal, as in Model II analysis of variance. 
The interested reader will easily see what the 
Bayesian counterpart of this test is from 
examples of tests in earlier sections of this 
paper and from the discussion of Bayesian 
applications of the F distributions in Chapter 
12 of Raiffa and Schlaifer (1961). 

About the very important topic of Model 
III analysis of variance, that is, analysis of 
variance ostensibly justified by the random- 
ized allocation of treatments, we can say only 
that it is by no means so straightforward as is 

sometimes believed (Savage et al., 1962, pp. 
33, 34, 87-92, and references cited there). 


Multidimensional normal measure- 
ments and a null hypothesis. For those 
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who may be interested in some 
relatively technical and tentative sug- 
gestions, we return in this section to 
the basic multidimensional normal 
testing problem that was defined in 
the last section. 

For simplicity, and with the same 
justification as in the one-dimensional 
case, we shall assume that the variance 
a? is known. The extension to un- 
known variance, in which the multi- 
variate normal distribution is replaced 
by multivariate ¢ distributions and 
the X? distributions are replaced by 
F distributions will be clear to 
many readers, especially on reference 
to Chapter 12 of Raiffa and Schlaifer 
(1961). 

Let à be an unknown vector in 
n-dimensional Euclidean space, and 
suppose that, given A, the measure- 
ment x is a vector spherically normally 
distributed around à with known 
variance o*. The likelihood ratio for 
the null hypothesis that \ =o is 
then evidently 


Lo; x) 


25 (==) 
oea a 
= o 
= x —X z 
o “l e( =>) rama 
o 


[21] 


where ¢ is the standard n dimensional 
normal density. Equation 21 simply 
does in n dimensions what Equation 
14 did in one. The n dimensional 
generalizations of the suggestions 
already made for appraising Z in the 
one-dimensional problem are so nat- 
ural that we shall be able to indicate 
them very briefly, and we shall hardly 
introduce any essentially new sug- 
gestions here. There is one important 
practical change with increasing 1; 
certain methods that would be fre- 
quently applicable for small n become 
increasingly useless with large n. 
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If u(A| Hı) is sufficiently gentle and 
is approximately equal to u’ near Xo, 
then, in analogy with Equation 15, 
the ideas of stable estimation permit 
the approximation 


ehe 
L(Ao; x) = -z7 


ET [22] 
where x? is written instead of # for 
the square of the length of the vector 
x — Xo divided by o?, as is usual when 
n is not necessarily 1. 

As n increases, conditions for the 
applicability of Equation 22 will be 
encountered more and more rarely. 
For one reason, the sphere about x 
within which w(A|H;) has to be 
nearly constant has radius somewhat 
larger than om, and the larger that 
sphere, the less plausible the assump- 
tion of constancy within it. Still 
worse, the spheres within which this 
density can reasonably be expected 
to remain nearly constant will typic- 
ally actually decrease in radius with 
increasing n. For example, in a study 
of three factors, each at four levels, 
the first-order interactions are ex- 
pressed by 27 parameters. To say 
that your opinion of these is diffuse 
with respect to some standard devia- 
tion ø implies, among other things, 
that even if you found out any 26 of 
the parameters you would not feel 
competent to guess the last one to 
within several g's. Even given the 
hypothesis that the interactions have 
no tendency to be small, it is hard to 
envisage situations in which the 
implication would be realistic. This 
example serves incidentally to remind 
us that there are often many “null” 
hypotheses claiming some measure of 
our credence. For example: all inter- 
actions vanish; all that involve the 
first factor vanish; all above those 
of the first order vanish, and the first- 
order interactions are well explained 
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by this or that simple theory; and so 
on. In principle, these problems of 
multiple decision are natural out- 
growths of the two-hypothesis situa- 
tion, but much work remains to be 
done on them. 

For a specified x, the prior distribu- 
tion most pessimistic toward the null 
hypothesis is once more concentrated 
at Ao and yields 


Taik = eht, 


If n is large, say at least 10, and the 
null hypothesis is true, then it is 
almost certain (before making the 
measurement x) that x? will be 
roughly equal to n. So, for large n, 
Lmin is very small indeed, even when 
compared with significance levels of 
classical tests applied to the same 
data. Therefore, Lmin will be of 
almost no practical use in such cases. 

A somewhat more realistic approach 
in the general spirit of Lmin would be 
to consider that spherically sym- 
metrical distribution which would 
most discredit the null hypothesis. 
This approach might be worth some 
exploration, but is mathematically 
rather intractable. 

The subjective upper and lower 
bounds for L that were illustrated in 
one dimension are easy to generalize 
to n dimensions. They may well 
prove less serviceable as ” increases, 
but they merit trial and study. 


We close this section with a sketchy report 
of what happens when u(d| Hi) is itself a 
spherical normal distribution about ào, with 
variance 72. We do so with particular diffi- 
dence, because there is here even less justifica- 
tion than before in hoping to approximate 
u(d| Hı) by a normal distribution centered at 
Xo, and because the assumption of spherical 
symmetry for this distribution will often be 
particularly unrealistic. Still, we hope that 
the exercise, regarded with caution, will be 
suggestive of truth which can later be verified 
in some more secure way. 
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TABLE 5 


VALUES OF 1,001(a) FOR WHICH THE BREAK- 
EVEN VALUE X,? IS JUST SIGNIFICANT 
AT THE .001 LEVEL FoR SELECTED 


VALUES OF æ 
a o/r Xe 1.001 (a) 
.01 .010 18.4 2 
al .101 18.6 4 
YA .204 26.8 8 
5 577 73.9 40 
8 1.333 470 379 
9 2.065 1,896 1,710 


Letting a be, as before, cjN + 77, Equa- 
tion 21 becomes 


1 
aee e ai: 
a” 2 


For a fixed fraction « and very large n, x? 
is initially almost certain, given Ho, to be 
within a few percent of n and, given Hı, 
within a few percent of n/a. As follows 
easily, it is initially almost sure that the 
experiment will firmly lead to a correct 
decision between Ho and Hi, no matter how 
close « is to 1, provided 1 is sufficiently large. 
For this reason, if for no other, we are bound 
to be interested in values of æ for large n that 
correspond to values of «/r so large that they 
would render the experiment worthless if n 
were small. 

The value of x? that speaks neither for nor 
against the null hypothesis for a specified æ is 


—n In a? 


xe = 
i 1—a@’ 
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an easy and natural generalization of Equa- 
tion 20. For large n, it is not reasonable to 
approximate X by substituting 1 for 1 — aè. 
Since the coefficient of m in Xo? is larger than 
1 for every fraction a and since the value of x? 
that is just significant at say, the .001 level 
only slightly exceeds » for sufficiently large m, 
there is some first integer 7.01 (æ) at which the 
break-even value Xo? is just significant at the 
.001 level. Some representative values are 
shown in Table 5. From the point of view of 
this model of the testing situation, which is of 
course not unobjectionable, the classical pro- 
cedure is startlingly prone to reject the null 
hypothesis contrary to what would often be 
very reasonable opinion. 

Paralleling the situation for = 1, it is 
a = Vne that is most pessimistic toward 
the null hypothesis for a specified value of 
x. The likelihood for this artificial value of 


ais 
ex?\n/2 
L normin = (2) x2, 


Table 6 shows the values of Znormin that 
correspond to the values of x? just significant 
at the .01 and .001 levels for several values 
of n. Here as in the one-dimensional case 
Lnormin is small, but not as small as classical 
significance levels might suggest. In all these 
cases æ is unrealistically large. 


This cursory glance at multidimen- 
sional normally distributed observa- 
tions has the same general conclusions 
as our more detailed study of the 
unidimensional normal case. Al- 
though the statistical theory of multi- 


TABLE 6 


VALUES OF Lyormin THAT CORRESPOND TO THE VALUES OF x? JUST SIGNIFICANT 
AT THE .01 AND .001 LEVELS FOR SELECTED VALUES OF 1 


X01? X.o01? 

n 
a ejr Lnormin a olr Lnormin 
1 .388 421 -1539 339 360 0242 
3 514 -600 1134 429 ATS 0166 
10 656 -870 0912 581 715 0127 
30 -768 1.198 0806 «709 1.005 -0108 
100 .858 2.075 0742 818 1.422 0097 
300 913 2.238 0712 .887 1.919 0092 
1,000 -950 3.059 -0696 -935 2.636 .0088 
3,000 .971 4.047 -0680 -962 3.499 .0087 
10,000 984 5.488 0675 979 3.798 0086 
ey 1.000 % -0668 1.000 o% 0084 


—— 
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dimensional observations (classical or 
Bayesian) is distressingly sketchy and 
incomplete, drastic surprises about 
the relation between classical and 
Bayesian multidimensional techniques 
have not turned up and now seem 
unlikely. 

Some morals about testing sharp null 
hypotheses. At first glance, our gen- 
eral conclusion that classical pro- 
cedures are so ready to discredit null 
hypotheses that they may well reject 
one on the basis of evidence which is 
in its favor, even strikingly so, may 
suggest the presence of a mathe- 
matical mistake somewhere. Not so; 
the contradiction is practical, not 
mathematical. A classical rejection 
of a true null hypothesis at the .05 
level will occur only once in 20 times. 
The overwhelming majority of these 
false classical rejections will be based 
on test statistics close to the border- 
line value; it will often be easy to 
demonstrate that these borderline 
test statistics, unlikely under either 
hypothesis, are nevertheless more un- 
likely under the alternative than 
under the null hypothesis, and so 
speak for the null hypothesis rather 
than against it. 

Bayesian procedures can strengthen 
a null hypothesis, not only weaken it, 
whereas classical theory is curiously 
asymmetric. If the null hypothesis 
is classically rejected, the alternative 
hypothesis is willingly embraced, but 
if the null hypothesis is not rejected, 
it remains in a kind of limbo of 
suspended disbelief. This asymmetry 
has led to considerable argument 
about the appropriateness of testing 
a theory by using its predictions as a 
null hypothesis (Grant, 1962; Guil- 
ford, 1942, see p. 186 in the 1956 edi- 
tion; Rozeboom, 1960; Sterling, 1960). 
For Bayesians, the problem vanishes, 
though they must remember that the 
null hypothesis is really a hazily de- 


fined small region rather than a point. 

The procedures which have been 
presented simply compute the likeli- 
hood ratio of the hypothesis that 
some parameter is very nearly a 
specified single value with respect to 
the hypothesis that it is not. They 
do not depend on the assumption of 
special initial credibility of the null 
hypothesis. And the general con- 
clusion that classical procedures are 
unduly ready to reject null hypotheses 
is thus true whether or not the null 
hypothesis is especially plausible a 
priori. At least for Bayesian statis- 
ticians, however, no procedure for 
testing a sharp null hypothesis is likely 
to be appropriate unless the null hy- 
pothesis deserves special initial cre- 
dence. It is uninteresting to learn 
that the odds in favor of the null hy- 
pothesis have increased or decreased 
a hundredfold if initially they were 
negligibly different from zero. 

How often are Bayesian and clas- 
sical procedures likely to lead to 
different conclusions in practice? 
First, Bayesians are unlikely to con- 
sider a sharp null hypothesis nearly so 
often as do the consumers of classical 
statistics. Such procedures make 
sense to a Bayesian only when his 
prior distribution has a sharp spike at 
some specific value; such prior dis- 
tributions do occur, but not so often 
as do classical null hypothesis tests. 

When Bayesians and classicists 
agree that null hypothesis testing is 
appropriate, the results of their pro- 
cedures will usually agree also. If the 
null hypothesis is false, the interocular 
traumatic test will often suffice to 
reject it; calculation will serve only 
to verify clear intuition. If the null 
hypothesis is true, the interocular 
traumatic test is unlikely to be of 
much use in one-dimensional cases, 
but may be helpful in multidimen- 
sional ones. In at least 95% of cases 
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when the null hypothesis is true, 
Bayesian procedures and the classical 
.05 level test agree. Only in border- 
line cases will the two lead to conflict- 
ing conclusions. The widespread 
custom of reporting the highest clas- 
sical significance level from among the 
conventional ones actually attained 
would permit an estimate of the 
frequency of borderline cases in pub- 
lished work; any rejection at the .05 
or .01 level is likely to be borderline. 
Such an estimate of the number of 
borderline cases may be low, since it 
is possible that many results not 
significant at even the .05 level remain 
unpublished. 

The main practical consequences for 
null hypothesis testing of widespread 
adoption of Bayesian statistics will 
presumably be a substantial reduction 
in the resort to such tests and a 
decrease in the probability of rejecting 
true null hypotheses, without sub- 
stantial increase in the probability of 
accepting false ones. 

If classical significance tests have 
rather frequently rejected true null 
hypotheses without real evidence, 
why have they survived so long 
and so dominated certain empirical 
sciences? Four remarks seem to shed 
some light on this important and 
difficult question. 

1. In principle, many of the re- 
jections at the .05 level are based on 
values of the test statistic far beyond 
the borderline, and so correspond to 
almost unequivocal evidence. In 
practice, this argument loses much of 
its force. It has become customary to 
reject a null hypothesis at the highest 
significance level among the magic 

values, .05, .01, and .001, which the 
test statistic permits, rather than to 
choose a significance level in advance 
and reject all hypotheses whose test 
statistics fall beyond the criterion 
value specified by the chosen signifi- 
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cance level. So a .05 level rejection 
today usually means that the test 
statistic was significant at the .05 
level but not at the .01 level. Still, 
a test statistic which falls just short 
of the .01 level may correspond to 
much stronger evidence against a null 
hypothesis than one barely significant 
at the .05 level. The point applies 
more forcibly to the region between 
.01 and .001, and for the region 
beyond, the argument reverts to its 
original form. 

2. Important rejections at the .05 
or .01 levels based on test statistics 
which would not have been significant 
at higher levels are not common. 
Psychologists tend to run relatively 
large experiments, and to get very 
highly significant main effects. The 
place where .05 level rejections are 
most common is in testing inter- 
actions in analyses of variance—and 
few experimenters take those tests 
very seriously, unless several lines of 
evidence point to the same con- 
clusions. 

3. Attempts to replicate a result 
are rather rare, so few null hypothesis 
rejections are subjected to an em- 
pirical check. When such a check is 
performed and fails, explanation of 
the anomaly almost always centers on 
experimental design, minor variations 
in technique, and so forth, rather than 
on the meaning of the statistical 
procedures used in the original study. 

4. Classical procedures sometimes 
test null hypotheses that no one would 
believe for a moment, no matter what 
the data; our list of situations that 
might stimulate hypothesis tests 
earlier in the section included several 
examples. Testing an unbelievable 
null hypothesis amounts, in practice, 
to assigning an unreasonably large 
prior probability to a very small 
region of possible values of the true 
parameter. In such cases, the more 
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the procedure is biased against the 
null hypothesis, the better. The 
frequent reluctance of empirical scien- 
tists to accept null hypotheses which 
their data do not classically reject 
suggests their appropriate skepticism 
about the original plausibility of these 
null hypotheses. 


LIKELIHOOD PRINCIPLE 


A natural question about Bayes’ 
theorem leads to an important con- 
clusion, the likelihood principle, which 
was first discovered by certain classical 
statisticians (Barnard, 1947; Fisher, 
1956), 

Two possible experimental out- 
comes D and D’—not necessarily of 
the same experiment—can have the 
same (potential) bearing on your 
opinion about a partition of events Hi, 
that is, P(H;|D) can equal P (H;| D’) 
for each 7. Just when are D and D’ 
thus evidentially equivalent, or of the 
same import? Analytically, when is 


P(D|Hi)P (Hi) 


LP (H;|D) =] P(D) 
P(D'| Hi) P (Ai) -_ Ten 
= Tl aED [= ea 
for each i? 


Aside from such academic possi- 
bilities as that some of the P (H;) are 
0, Equation 23 plainly entails that, 
for some positive constant k and for 
all z, 


P(D'|H,) = kP(D|H;). [24] 
But Equation 24 implies Equation 23, 
from which it was derived, no matter 
what the initial probabilities P(H:) 
are, as is easily seen thus: 
P(D’) = =P(D’| Hi) P (Ai) 
= k=P(D|H,) P(A) 
= kP(D). 
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This conclusion is the likelihood 
principle: Two (potential) data D and 
D’ are of the same import if Equation 
24 obtains. 

Since for the purpose of drawing 
inference, the sequence of numbers 
P(D|H,) is, according to the likeli- 
hood principle, equivalent to any 
other sequence obtained from it by 
multiplication by a positive constant, 
a name for this class of equivalent 
sequences is useful and there is 
precedent for calling it the likelihood 
(of the sequence of hypotheses Hi 
given the datum D). (This is not 
quite the usage of Raiffa & Schlaifer, 
1961.) The likelihood principle can 
now be expressed thus: D and D’ have 
the same import if P(D|H;) and 
P(D'|H;,) belong to the same likeli- 
hood—more idiomatically, if D and D’ 
have the same likelihood. 

If, for instance, the partition is two- 
fold, as it is when you are testing a null 
hypothesis against an alternative hy- 
pothesis, then the likelihood to which 
the pair [P (D | Ho), P (D | Hı) ] belongs 
is plainly the set of pairs of numbers 
[a, b] such that the fraction a/b is 
the already familiar likelihood ratio 
L(Hy; D)=P(D|H)/P(D|H:). The 
simplification of the theory of testing 
by the use of likelihood ratios in place 
of the pairsof conditional probabilities, 
which we have seen, is thus an appli- 
cation of the likelihood principle. 

Of course, the likelihood principle 
applies to a (possibly multidimen- 
sional) parameter \ as well as to a 
partition H;. The likelihood of D, or 
the likelihood to which P(D|\) be- 
longs, is the class of all those functions 
of \ that are positive constant mul- 
tiples of (that is, proportional to) the 
function P(D|\). Also, conditional 
densities can replace conditional prob- 
abilities in the definition of likelihood 
ratios. 

There is one implication of the like- 
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lihood principle that all statisticians 
seem to accept. It is not appropriate 
in this paper to pursue this implica- 
tion, which might be called the 
principle of sufficient statistics, very 
far. One application of sufficient 
statistics so familiar as almost to 
escape notice will, however, help bring 
out the meaning of the likelihood 
principle. Suppose a sequence of 100 
Bernoulli trials is undertaken and 20 
successes and 80 failures are recorded. 
What is the datum, and what is its 
probability for a given value of the 
frequency p? We are all perhaps 
overtrained to reply, ‘The datum is 20 
successes out of 100, and its proba- 
bility, given p, is CiPp2(1 — p)®.” 
Yet it seems more correct to say, 
“The datum is this particular sequence 
of successes and failures, and its 
probability, given p, is p?°(1 — p)®.” 
The conventional reply is often more 
convenient, because it would be costly 
to transmit the entire sequence of 
observations; it is permissible, because 
the two functions CiPp(1 — p)® 
and p°(1 — p)® belong to the same 
likelihood; they differ only by the 
constant factor C3. Many classical 
statisticians would demonstrate this 
permissibility by an argument that 
does not use the likelihood principle, 
at least not explicitly (Halmos & 
Savage, 1949, p. 235). That the two 
arguments are much the same, after 
all, is suggested by Birnbaum (1962). 
The legitimacy of condensing the 
datum is often expressed by saying 
that the number of successes in a 
given number of Bernoulli trials is a 
sufficient statistic for the sequence of 
trials. Insofar as the sequence of 
trials is not altogether accepted as 
Bernoullian—and it never is—the 
condensation is not legitimate. The 
practical experimenter always has 
some incentive to look over the se- 
quence of his data with a view to 
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discovering periodicities, trends, or 
other departures from Bernoullian 
expectation. Anyone to whom the 
sequence is not available, such as the 
reader of a condensed report or the 
experimentalist who depends on auto- 
matic counters, will reserve some 
doubt about the interpretation of the 
ostensibly sufficient statistic. 

Moving forward to another applica- 
tion of the likelihood principle, imag- 
ine a different Bernoullian experiment 
in which you have undertaken to 
continue the trials until 20 successes 
were accumulated and the twentieth 
success happened to be the one 
hundredth trial. It would be con- 
ventional and justifiable to report only 
this fact, ignoring other details of the 
sequence of trials. The probability 
that the twentieth success will be the 
one hundredth trial is, given p, easily 
seen to be Ci3p(1 — p)®. This is 
exactly 1/5 of the probability of 20 
successes in 100 trials, so according to 
the likelihood principle, the two data 
have the same import. This con- 
clusion is even a trifle more immediate 
if the data are not condensed; for a 
specific sequence of 100 trials of which 
the last is the twentieth success has 
the probability p(1 — p)® in both 
experiments. Those who do not 
accept the likelihood principle believe 
that the probabilities of sequences 
that might have occurred, but did not, 
somehow affect the import of the 
sequence that did occur. 

In general, suppose that you collect 
data of any kind whatsoever—not 
necessarily Bernoullian, nor identically 
distributed, nor independent of each 
other given the parameter \—stopping 
only when the data thus far collected 
satisfy some criterion of a sort that is 
sure to be satisfied sooner or later, 
then the import of the sequence of ” 
data actually observed will be exactly 
the same as it would be had you 
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planned to take exactly n observations 
in the first place. It is not even 
necessary that you stop according to 
a plan. You may stop when tired, 
when interrupted by the telephone, 
when you run out of money, when you 
have the casual impression that you 
have enough data to prove your point, 
and so on. The one proviso is that 
the moment at which your observa- 
tion is interrupted must not in itself 
be any clue to \ that adds anything 
to the information in the data already 
at hand. A man who wanted to know 
how frequently lions watered at a 
certain pool was chased away by lions 
before he actually saw any of them 
watering there; in trying to conclude 
how many lions do water there he 
should remember why his observation 
was interrupted when it was. We 
would not give a facetious example 
had we been able to think of a serious 
one. A more technical discussion of 
the irrelevance of stopping rules to 
statistical analysis is on pages 36-42 
of Raiffa and Schlaifer (1961). 

This irrelevance of stopping rules to 
statistical inference restores a sim- 
plicity and freedom to experimental 
design that had been lost by classical 
emphasis on significance levels (in the 
sense of Neyman and Pearson) and on 
other concepts that are affected by 
stopping rules. Many experimenters 
would like to feel free to collect data 
until they have either conclusively 
proved their point, conclusively dis- 
proved it, or run out of time, money, 
or patience. Classical statisticians 
(except possibly for the few classical 
defenders of the likelihood principle) 
have frownedfion collecting data one 
by one or in batches, testing the total 
ensemble after each new item or 
batch is collected, and stopping the 
experiment only when a null hy- 
pothesis is rejected at some preset 
significance level. And indeed if an 
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experimenter uses this procedure, 
then with probability 1 he will 
eventually reject any sharp null 


hypothesis, even though it be true. 
This is perhaps simply another illus- 
tration of the overreadiness of classical 
procedures to reject null hypotheses. 
In contrast, if you set out to collect 
data until your posterior probability 
for a hypothesis which unknown to 
you is true has been reduced to .01, 
then 99 times out of 100 you will 
never make it, no matter how many 
data you, or your children after you, 
may collect. (Rules which have 
nonzero probability of running forever 
ought not, and here will not, be called 
stopping rules at all.) 

The irrelevance of stopping rules is 
one respect in which Bayesian pro- 
cedures are more objective than classi- 
cal ones. Classical procedures (with the 
possible exceptions implied above) in- 
sist that the intentions of the experi- 
menter are crucial to the interpreta- 
tion of data, that 20 successes in 100 
observations means something quite 
different if the experimenter intended 
the 20 successes than if he intended 
the 100 observations. According to 
the likelihood principle, data analysis 
stands on its own feet. The intentions 
of the experimenter are irrelevant to 
the interpretation of the data once 
collected, though of course they are 
crucial to the design of experiments. 

The likelihood principle also creates 
unity and simplicity in inference 
about Markov chains and other 
stochastic processes (Barnard, Jen- 
kins, & Winsten, 1962), which are 
sometimes applied in psychology. 
It sheds light on many other problems 
of statistics, such as the role of un- 
biasedness and Fisher’s concept of 
ancillary statistic. A principle so 
simple with consequences so pervasive 
is bound to be controversial. For 
dissents see Stein (1962), Wolfowitz 
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(1962), and discussions published 
with Barnard, Jenkins, and Winsten 
(1962), Birnbaum (1962), and Savage 
et al. (1962) indexed under likelihood 
principle. 


In RETROSPECT 


Though the Bayesian view is a 
natural outgrowth of classical views, 
it must be clear by now that the 
distinction between them is impor- 
tant. Bayesian procedures are not 
merely another tool for the working 
scientist to add to his inventory along 
with traditional estimates of means, 
variances, and correlation coefficients, 
and the ¢ test, F test, and soon. That 
classical and Bayesian statistics are 
sometimes incompatible was illus- 
trated in the theory of testing. For, 
as we saw, evidence that leads to 
classical rejection of the null hypoth- 
esis will often leave a Bayesian more 
confident of that same null hypothesis 
than he was to start with. Incom- 
patibility is also illustrated by the 
attention many classical statisticians 
give to stopping rules that Bayesians 
find irrelevant. 

The Bayesian outlook is flexible, 
encouraging imagination and criticism 
in its everyday applications. Bayes- 
ian experimenters will emphasize suit- 
ably chosen descriptive statistics in 
their publications, enabling each 
reader to form his own conclusions. 
Where an experimenter can easily 
foresee that his readers will want the 
results of certain calculations (as for 
example when the data seem suffi- 
ciently precise to justify for most 
readers application of the principle of 
stable estimation) he will publish 
them. Adoption of the Bayesian out- 
look should discourage parading sta- 
tistical procedures, Bayesian or other, 
as symbols of respectability pretend- 
ing to give the imprimatur of mathe- 
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matical logic to the subjective process 
of empirical inference. 

We close with a practical rule which 
stands rather apart from any conflicts 
between Bayesian and classical sta- 
tistics. The rule was somewhat 
overstated by a physicist who said, 
“As long as it takes statistics to find 
out, I prefer to investigate something 
else.” Of course, even in physics 
some important questions must be 
investigated before technology is suffi- 
ciently developed to do so definitively. 
Still, when the value of doing so is 
recognized, it is often possible so to 
design experiments that the data 
speak for themselves without the 
intervention of subtle theory or in- 
secure personal judgments. Estima- 
tion is best when it is stable. Rejec- 
tion of a null hypothesis is best when 
it is interocular. 
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MAGNITUDE SCALES, CATEGORY SCALES, 
AND FECHNERIAN INTEGRATION ! 
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In the theoretical part of the paper, Fechnerian integration is shown 
to be a sound procedure mathematically. In the 2nd part, 3 series of 
experiments are described in which length of lines was scaled by the 
methods of magnitude estimation and category rating for 3 different 
exposure times. The subjective Weber function was obtained from the 
intraindividual SDs of the magnitude estimates, and it was demon- 
strated that the Fechner integral of this function with respect to 
subjective magnitude agrees with the category scale in all 3 cases. 
Finally, theoretical implications of the result are discussed. 


Consequent upon the development 
of direct scaling methods in psycho- 
physics, Fechner’s logarithmic law has 
fallen into disrepute along with the 
reasoning from which it is derived. 
Thus Stevens (1961a, 1961b), for in- 
stance, has rejected Fechner’s law 
on both theoretical and empirical 
grounds. He has found the relation 
between sensory magnitude, measured 
on a ratio scale, and physical magni- 
tude to be a power function rather 
than a logarithmic function (Stevens, 
1957, 1960). Furthermore, Luce and 
Edwards (1958) have concluded that 
Fechnerian integration is permissible 
only in the special case in which 
Weber’s law holds. 

However, when a psychophysical 
scaling experiment aims at an interval 
scale, as in category estimation, the 
psychophysical relation seems to con- 
stitute a compromise between Fech- 
ner’s logarithmic and Stevens’ power 
function. At least this appears to be 
the case for prothetic continua such 
as brightness and loudness (Stevens & 
Galanter, 1957). In a previous paper 
(Eisler, 1962b) I have shown that the 
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category scale may be regarded as a 
Fechner integral, and that deviations 
from Weber's law in certain sub- 
jective continua are at the root of 
corresponding deviations from the 
logarithmic function. This model was 
confirmed empirically for the loudness 
and softness of white noise and for the 
intensity of smell (Eisler, 1962a, 
1963). 

The present paper deals with the 
problem of Fechnerian integration 
from the mathematical point of view 
and once again presents a corrobora- 
tion of the model (in this case for 
length of lines). 


FECHNERIAN INTEGRATION 


Fechnerian integration is regarded 
here as a computational method by 
which one set of scale values can be 
derived from another when there is 
some known measure of uncertainty 
for both sets. The Fechner integral 
proper yields a sensation scale as a 
function of the corresponding stimulus 
scale. This function typically is 
logarithmic, since it is based (a) on 
Weber’s law, which holds that the 
measure of the observer’s uncertainty 
in physical units is proportional to 
stimulus magnitude; and (b) on the 
postulation that the measure of the 
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observer's uncertainty in subjective 
units is constant over the whole range 
investigated (cf. Luce & Edwards, 
1958). 

In the present paper, Fechnerian 
integration is used to derive a category 
scale from the corresponding magni- 
tude scale. The present approach 
differs from typical procedure as 
follows: 


1. Stimulus and sensation scales are 
replaced by magnitude and category 
scales. The relation sought is thus a 
relation between two subjective scales. 

2. As a measure of the uncertainty 
of the independent variable (magni- 
tude scale) there has been chosen the 
intraindividual standard deviation of 
the observers’ magnitude estimates. 
The Fechner integral proper is based 
on the jnd on the stimulus side. 

3. As a measure of the uncertainty 
of the dependent variable (category 
scale) there has been chosen the intra- 
individual standard deviation of the 
observers’ category estimates. Fech- 
ner worked with sensation jnd’s. In 
neither case does the measure really 
enter into the calculations. There is, 
however, an important difference: 
whereas Fechner postulated the con- 
stancy of the subjective jnd, the SDs 
from category ratings can be calcu- 
lated and the assumption of their 
constancy subjected to empirical test. 

4. Weber’s law does not hold for 
the magnitude scale and its standard 
deviation for the experiments pre- 
sented below. 


Whereas it is hard to see how the 
changes described in Points 1-3 
could affect the validity of Fechnerian 
integration as a calculating procedure, 
the fact that Weber’s law does not 
obtain (Point 4) raises some doubt as 
to whether it is permissible to use 
this method in the case at hand. 
Fechnerian integration is based on the 
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“mathematical auxiliary principle” 
that differences (jnd’s) may be re- 
placed by differentials and the result- 
ing differential equation integrated. 
Luce and Edwards (1958) point out 
that this principle is not generally 
valid, though it happens to be ap- 
plicable when Weber's law or its linear 
generalization holds. ‘Though their 
argument is based on jnd’s as differ- 
ences, it is equally valid for SDs. 

Luce and Edwards show that, as _ 
long as the differences of the inde- $ i 
pendent variable (stimulus scale) are 
finite, the differences of the dependent 
variable (sensation scale), calculated 
from the Fechner integral, are not 
constant, i.e., independent of the 
stimulus values, and thus Fechner’s 
postulation is violated. In this situa- 
tion Luce and Edwards recommend 
graphical addition of jnd’s instead of 
integration of the Weber function.? 
But the graphical method also en- 
counters difficulties. Consider, for 
instance, the Weber function 


Ab = ko [1] 
where ® refers to the stimulus scale 
and A@ to the jnd. Let us, for the 
sake of argument, change the custom- 
ary cutoff point on the psychometric 
function, i.e., 75%, to, say, 55%. 
Since the size of the jnd is determined 
by the cutoff point chosen, we obtain 
smaller jnd’s. Assuming that the 
nature of the Weber function is un- 
altered, the parameter & in Equation 1 
decreases. As Figure 1 demonstrates, 
however, the value of the parameter 
k is crucial for the form of the sensa- 
tion scale constructed by the graphical 
method (cf. Luce, 1959, p. 34). The 
Fechner integral, on the other hand, 


2A Weber function is any function that 
relates a measure of uncertainty of a psycho- 
physical variable to its central values. 
Weber's law is a special case of a Weber 
function. 
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is independent of the particular &, 
since & does not enter into the integra- 
tion and only determines the arbitrary 
unit of the sensation scale. In Figure 
1, all sensation scales have undergone 
a linear transformation (permissible 
for interval scales) to make the end 
points coincide and the functions 
thereby comparable. As I see it we 
are faced with a dilemma. Either our 
sensation scale changes with the 
arbitrary cutoff point of the psycho- 
metric function—entailing a lack 
of uniqueness—or, if integration is 
chosen, it seems that a self-contra- 
dictory model must be accepted. 
Fortunately, things are not quite as 
bad as this. The error involved in 
accepting Fechner’s model decreases 
with decreasing size of the jnd and 
vanishes when the jnd becomes in- 
finitesimal. The Fechner integral, 
however, remains unchanged as long 
as the decrease of the jnd entails only 
a decrease of the parameter k. (This 
statement is valid for all Weber 
functions, as long as they can be 
described as Ab = kf(#), ao =0, 
since does not affect the integration.) 
A sensation function derived through 
Fechnerian integration is based im- 
plicitly on infinitesimal jnd’s, or, to 
express the same thing operationally, 
on jnd’s taken from a cutoff point 
which approaches 50% without limit 
(cf. Bjérkman, 1958). Since the scale 
obtained in this way is unique it 
seems to be preferable to one con- 
structed from finite jnd’s. 

The same argument holds when 
applied to SDs instead to jnd’s. To 
pick just one (whole) standard devia- 
tion as a measure of uncertainty is as 
arbitrary as choosing the 75% cutoff 
point. Fechnerian integration in this 
case implies that in reality an in- 
finitesimal fraction of an SD is used. 

Perhaps it should be pointed out 
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Sensation magnitude v 
- 


5 
Stimulus magnitude @ 

Fic. 1. Hypothetical Fechner functions 
derived from the Weber function Aè = bP 
(.05 S @ < 10) by cumulating jnd’s (se 
Figure 1 in Luce & Edwards, 1958) and by 
integration for different values of the pa- 
rameter è. (The end points of all curves have 
been made to coincide by means of linear 
transformations of the sensation scales.) 


that the reasoning above refers to 
Fechnerian integration as a method, 
aside from its content. It does not 
imply equality between jnd's and SDs. 

Now the formula for the category 


scale can be derived as follows: 
Ay = hi f(¥) R] 
AK = k, (3) 


where ¥ is the magnitude scale, Ay 
its standard deviation and kif(y) its 
Weber function. K and AK denote 
the category scale and its standard 
deviation. Changing to differentials 
and dividing Equation 3 by 2, we 
have 


dK k 
HIO [4] 
and integrating Equation 4 gives 
ay 
Ret J FW [5] 


EXPERIMENTS AND RESULTS 


The continuum investigated was 
length of lines, presented tachisto- 
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scopically for three durations. Since 
earlier studies by Stevens and Galan- 
ter (1957) have shown that the 
category scale for length of lines, when 
plotted against the magnitude scale, 
exhibits only a slight curvature, the 
continuum may be said to be only 
slightly prothetic (cf. Eisler, 1963). 
In the investigation on the loudness 
and softness of white noise, and the 
intensity of smell mentioned above, 
the relation between the category and 
magnitude scales deviated from the 
logarithmic relation because the 
Weber function’ turned out to be the 
linear generalization of Weber’s law. 
The straight line describing the growth 
of intraindividual SDs with subjective 
magnitude did not pass through the 
origin. If the (negative) y intercept 
is denoted by yo, the K-y relation 
calculated by Fechnerian integration 
(Eisler, 1962b) can be expressed as: 


K = alog (4 — y) +8 [6] 


As the absolute value of Yọ increases 
(Yo becomes more negative), the 
curvature given by Equation 6 de- 
creases (cf. Eisler, 1962b, Figure 2) 
and the continuum can be said to grow 
less prothetic. 

The aim of the present investigation 
was: 


1) To test the hypothesis that the 
category scale is the Fechner integral 
of the subjective Weber function. 

2) To study the relations among 
stimulus exposure time, the y inter- 
cept, and the degree of ‘“‘prothetic- 
ness,” on the assumption that the 
generalized Weber law holds also for 
length of lines. 


Whereas it was possible to predict 
the category scale by means of 
Fechnerian integration, Weber’s law 
did not hold over the whole range 
investigated; in spite of varying Yos, 
the category scales were much the 
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same for all three durations of ex- 
posure. 


Subjects and Procedures 


Thirteen undergraduate students of psy- 
chology served as subjects. The stimuli were 
luminous horizontal strips projected on a 
screen. The room was darkened so as to 
minimize the discernibility of the edges of the 
screen, in an effort to avoid giving the subjects 
anchors that might have influenced them to 
judge position rather than length. There is 
some doubt, however, as to whether this 
precaution was completely successful. 

All 15 strips were 1.2 centimeters wide as 
measured on the screen, and their lengths 
varied between 3.5 and 214.0 centimeters. 
(See Table 1, Column 1 for the spacing of the 
stimuli.) The subjects sat 641 centimeters 
behind the screen. They presented them- 
selves with the stimuli by pressing a switch, 
and they were allowed as many presentations 
as they wanted before giving a judgment. 

There were three experimental series each 
containing two experiments: (a) a magnitude 
estimation experiment in which the ninth of 
the 15 stimuli (on a scale from shortest to 
longest), 127.5 centimeters, was the standard, 
and was assigned a subjective magnitude of 
10 and (b) a category rating experiment with 
nine categories, The subject was presented 
with the standard in the magnitude estimation 
experiments, or the two extreme stimuli in the 
category rating experiments, once immedi- 
ately after the instruction was given. The 
three series differed in the length of the 
exposure time (the projector was fed through 
a timer): for Series I, 0,12 seconds, for Series 
II, 0.50 seconds, in Series III the subjects 
could look at the stimulus as long as they 
wished by keeping the button pressed. 

The same subjects took part in all the 
experiments under all the conditions; the 
same stimuli were used in each experiment. 
Everything possible was rotated and ran- 
domized, the order of series, the order of 
experiments among the subjects, the order of 
stimuli presented in each experiment to each 
subject. No subject was used in more than 
One experiment on any one day. 

Each subject made four judgments of each 
stimulus under each of the conditions. 


Magnitude and Category Scales 


Table 1 gives the stimulus values 
together with the subjective magni- 
tudes and category values. The 
subjective magnitudes were computed 
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TABLE 1 


SUBJECTIVE MAGNITUDES (Y) AND CATEGORIES (K) WITH THEIR INTRAINDIVIDUAL 
STANDARD DEVIATIONS ¢ FOR LENGTH OF LINES (¢) 


UNDER THREE EXPERIMENTAL CONDITIONS 


$. Series I Series III 
centi- 
meters 
Y o% K ~ K eK 
3.5 1.524 | .385 | 1.000}. -286 | 1.000 | .000 
(.437)| (.111) (.104) 
10.7 1.837 | .302 | 1.731 |. 408 | 1.769 | .240 
(.306) 
21.6 | 2.260] .284 | 2.308 |. 158 | 2.442 | .318 
(.144) 
42.5 | 3.682] .445 | 3.192). 410 | 3.288 | .392 
63.1 5.189 | .537 | 3.981 |. 591 | 4.096 | .399 
85.0 | 6.633 | .685 | 4.692 |. 815 | 4.942 | 422 
96.6 | 7.563 | .723 | 5.288 |. 725 | 5.404 | .339 
107.0 | 8.367} .621 | 5.558 |. 652 | 5.615 | .354 
127.5 | 10.076 | .744 | 6.250 |. 744 | 6.346 | .367 
138.0 | 10.760 | .661 | 6.500 |. 995 | 6.654 | .310 
150.4 | 11.720 | 1.122 | 6.942 |. 989 | 6.885 | .379 
171.0 | 14.028 | .979 | 7.481 |. 859 | 7.692 | .379 
181.4 | 14.656 | 1.019 | 7.962 |. 789 | 7.942 | .286 
192.3 | 16.000 | .839 | 8.231 |. 641 | 8.250 | .333 
214.0 | 17.905 | .857 | 8.942 1.023 | 8.962 | .170 


Note.—The lowest magnitudes are corrected values (see text) and the corresponding uncorrected values of 
y and oy are given in parentheses. 


magnitude scale 


Length of lines 


category scale 
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Fic. 2. Magnitude (Ia, Ila, Ia) and category (Ib, IIb, IIIb) scales for length of lines 
as a function of physical length for three different periods of exposure of the stimulus. (The 
magnitude scales are fitted by power functions; the points denoted by triangles have been 
excluded in fitting the curve.) 
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by taking the arithmetic mean of the 
four judgments made by each ob- 
server, for each stimulus and condi- 
tion, and the geometric mean of the 
arithmetic means of all the observers. 
In other words, the arithmetic mean 
is taken within, and the geometric 
mean between, the observers. This 
procedure corresponds to the way in 
which the intraindividual SDs for the 
magnitude estimations were calcu- 
lated. The category judgments were 
averaged simply by taking the arith- 
metic mean of all 52 judgments 
obtained for each stimulus under each 
condition. 

The subjective magnitude is plotted 
as a function of the physical stimulus 
scale in Figure 2 (la, Ila, Illa). The 


Intraindividual standard deviation 
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1 3 5 7 s 1 3 


Hannes EISLER 


Roman numerals in the figures refer to 
the three series of different exposure 
durations. The three plots, I, II, and 
Illa, reveal a sigmoid trend, indicat- 
ing that the whole range of lengths of 
line studied cannot be described by a 
power function. The lowest points 
(denoted by triangles in the figure) 
were therefore excluded and the 
others fitted to a power function by 
the method described by Ekman 
(1961). The lines indicate the fitted 
power functions. 

Figure 2 (Ib, Ilb, IIb) shows the 
category scales as a function of phys- 
ical length ; the lines are smooth curves 
fitted to the data points. Here the 
lowest points do not break the curve 
as they do in Figure 2 (Ia, Ila, Ia). 


10 15 20 


Length of lines (magnitude 


Length of lines (category scale) 


Fic. 3. Intraindividual SDs as a function of subjective magnitude (Ia, IIa, IIIa) and 
categories (Ib, IIb, ITb) for length of lines under three experimental conditions. (In Figures 
Ia, Ila, and IIIa the empirical SDs are represented by dots except at the low end, where the 
points are derived from corrected values—see text; the corresponding uncorrected values are 
plotted as triangles. To emphasize the trend the points have also been smoothed over three 
values—circles. The straight lines are fitted by eye over the region within which Weber’s 


generalized law seems to hold.) 
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Fic. 4. Empirical category scale values as a function of the Fechner integral and empirical 
and predicted category scales as a function of the magnitude scale. (Figures Ia, Ia, and Iia 
show empirical category values for length of lines under Experimental Conditions 1, I, and 


III as a function of the computed Fechner integrals. The lowest points have been corrected— 
ing uncorrected 


see text—and the corresponding 


values are indicated by triangles. The straight 


lines constitute a least squares fit. Figures Ib, Ib, and IIb show the category scale as a 
function of the magnitude scale. The curves correspond to the lines in Figures la, Ia, and 


Illa. The triangles are the uncorrected values, 


The curvature is slight and far from 
logarithmic for all lengths of exposure. 
Variability 

Intraindividual SDs were calculated 
from the magnitude estimations ac- 
cording to a method described by 
Eisler (1962a). They are given in 
Table 1 and plotted in Figure 3 (la, 
Ila, Illa). The deviating magnitude 
values (triangles in Figure 2—Ia, Ia, 
Ia) were replaced by values derived 
from the power functions. The SDs 
were computed for both the original 
and the corrected values. The uncor- 
rected values are indicated by tri- 
angles, the corrected values by dots. 
Because of the considerable scatter, 
the SDs have also been smoothed over 
three points (moving average) desig- 
nated by circles to emphasize the 


) 


trend. Figure 3 (la, Ia, Ia) shows 
that the generalized Weber law is valid 
only for the lower part of the range of 
the magnitude estimations. The ý 
intercepts differ with exposure time. 
The straight lines were fitted by eye 
to the part of the range where Weber's 
generalized law holds. 

The intraindividual SDs for the 
category ratings are computed in the 
standard way; they are given in 
Table 1 and Figure 3 (Ib, IIb, IIb). 
Except at the extremes of the range 
the variability of the category ratings 
is roughly constant. (cf. Point 3 on 
p. 244.) 


Fechnerian Integration 


Since the Weber functions obtained 
could not be expressed in a simple 
equation, integration had to be per- 
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formed numerically. The reciprocals 
of all SDs were integrated with respect 
to subjective magnitude by the trape- 
zoid method, point by point. The 
plots of the empirical category scale 
values as a function of the Fechner 
integral are shown in Figure 4 (Ia, Ila, 
Illa). (The lowest stimulus was arbi- 
trarily assigned the value one on the 
Fechner integral.) The triangles 
stand for uncorrected deviating points 
and the circles corresponding to the 
same stimuli for corrected points. 
The other values were calculated from 
the magnitude estimations obtained 
empirically, not from the power 
functions. The straight lines were 
fitted by the method of least squares. 
The curves in Figure 4 (Ib, IIb, IIIb), 
which are derived from the straight 
lines of Figure 4 (Ia, Ila, Ia), show 
the predicted category scales as a func- 
tion of the magnitude scales. The 
triangles again represent uncorrected 
points. 


DISCUSSION 


The fit of the curves to the points 
in Figure 4 is close enough to confirm 
again the hypothesis that a category 
scale is the Fechner integral of the 
subjective Weber function. A closer 
scrutiny, however, reveals that the 
agreement between computed and 
empirical values is not quite so good 
in the lower range. This disagreement 
may be attributed to at least three 
factors: 


1. The corrected points are extrap- 
olations and thus relatively un- 
certain. 

2. The SDs for the lower points 
have small numerical values, to that 
errors are greatly enlarged when 
reciprocals are taken. 

3. The extreme stimuli are prob- 
ably recognized by the observers more 
often than the middle stimuli. This 
bias would produce spuriously low 
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SDs for the extreme stimuli both for 
the magnitude estimations and the 
category ratings. 


Whereas the constancy of the SDs 
of the category ratings is built into 
the procedure of Fechnerian integra- 
tion, so that the integral is not 
affected by the bias regarding cate- 
gory ratings, the same bias in the 
magnitude estimations will distort the 
integral, since it is in effect their 
SDs that are integrated. In the 
studies on loudness, softness, and 
smell, the bias could be remedied by 
finding an equation for the Weber 
function, but this way out was not 
open in the present investigation. 

A remarkable outcome is the con- 
nection between the power function 
and the Fechner integral. In the 
region where the power function does 
not apply, neither does the Fechnerian 
integration. If however the power 
function is extended over the whole 
range, the Fechnerian integration can 
be extended correspondingly. 

The sigmoid form of the y-ọ func- 
tion is somewhat puzzling; perhaps 
the observers changed their frame of 
reference; the short lines may have 
been experienced as rectangles rather 
than as lines, especially when they 
were watched for a long time. For 
the shortest exposure only the point 
corresponding to the smallest stimulus 
deviates from the power function. In 
catgory rating no such change of 
frame of reference seems to have taken 
place, probably because discrimina- 
tion, the basis of category scales, is 
not affected. The curves relating 
category values to physical values are 
smooth (Figure 2—Ib, IIb, IIIb) and 
the curves showing the category scale 
as a function of the magnitude scale 
are smooth for the corrected, but some- 
what irregular for the uncorrected, 
points (Figure 4—Ib, IIb, IIIb). 

As Figure 3 (Ia, Ila, IIa) shows, 
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the y intercept shifts regularly with 
exposure time. I had expected a shift 
in the opposite direction, that is to 
say, that the absolute values of the 
intercepts would increase with increas- 
ing exposure time, and thus that the 
continuum would appear less and less 
prothetic. The data show that the 
interplay of the three parameters— 
y intercept, slope, and form of the 
Weber function—may keep the curva- 
ture of the plots fairly constant, as in 
Figure 4 (Ib, IIb, IIIb), even though 
the parameters change. The intersec- 
tion of the Weber function with the y 
axis seems to constitute some kind of 
anchoring point for the observer, ap- 
proaching zero—the expected anchor- 
ing point—as the judgments become 
more precise, i.e., with longer exposure 
time. Similar observations regarding 
anchoring points have been made be- 
fore (see, e.g., Reese, 1953, p. 13; 
Eisler, 1960). 

In an earlier study (Eisler, 1962b) 
I have shown that, certain assump- 
tions granted, the category scale must 
be a logarithmic function of the 
magnitude scale. 1 also suggested 
that another way of looking at the 
problem was to consider the category 
scale as a Fechner integral with 
respect to subjective magnitude. 
When Weber’s law holds, the two 
points of view are indistinguishable. 
The experiment on lengths of lines 
demonstrates that basically the cate- 
gory scale is a Fechner integral, 
whatever the form of the Weber func- 
tion. Only when the Weber function 
is Weber’s law, do we obtain a 
logarithmic function ; when the Weber 
function is the generalized form of 
Weber’s law, we obtain the corrected 
log function given in Equation 6. 


CONCLUDING REMARKS 


In my first paper dealing with 
category scales (Eisler, 1962b) I listed 


some of the views encountered in the 
literature on this topic. In principle, 
there were two points of view. 

The first, whose most prominent 
protagonist is S. S. Stevens, claims 
that the category scale is in effect a 
distorted magnitude scale. According 
to Stevens’ argument the observers 
are unable to free themselves com- 
pletely from the influence exerted by 
the variation of discrimination with 
subjective magnitude. Were dis- 
crimination the only determining fac- 
tor, then the function relating the 
category to the magnitude scale would 
be purely logarithmic, as the function 
relating the jnd scale to the magnitude 
scale really is for prothetic continua 
(Stevens, 1957). Since it is proposed 
in this paper that the category scale is 
exclusively determined by the ob- 
server’s discrimination, the difference 
between the nonlogarithmic category 
and the logarithmic jnd scale needs 
some comments. Obviously these two 
scales are based on two different 
measures of the observer's uncer- 
tainty, since the logarithmic jnd scale 
requires Weber’s law to hold, whereas 
any other discrimination scale is 
incompatible with the Weber law 
proper.’ This difference may be 
attributed either to the data them- 
selves (stemming from the constant 
method and magnitude estimation, 
respectively) or to the treatment of 
the data. Whereas for the derivation 
of the category scale intraindividual 
SDs are used, the jnd scale is generally 
based on a confusion of intra- and 
interindividual response frequencies. 
Variability derived from this in- 
consistent mixture seems to conform 
to the pure Weber law. 


3 The argument presupposes the correctness 
of the psychophysical power law. When 
y = kọ” and the jnd scale u = alog¢ + B, 


iln 
then u = «log (¥) +8 =f logy +8". 
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Representatives of the second point 
of view consider the category scale to 
be a pure log function of the magni- 
tude scale without paying attention 
to the empirical deviations. Their 
explanations differ somewhat and a 
discussion of them is given in Eisler 
(1962b). 

A few other papers touching on the 
problem of category scales should be 
mentioned. Rosner (1962) starts 
out from the observation that cate- 
gory scales and magnitude scales do 
not agree within a linear transforma- 
tion. He suggests as a possible 
explanation that intervals are psy- 
chologically meaningless, and hence 
that the numbers used as scale values 
in subjective ratio scales “lie in the 
Abelian multiplicative group of posi- 
tive real numbers and not in the field 
of real numbers, where both addition 
and multiplication are possible and 
related by the distributive law.” 
Psychophysical experiments carried 
out by Goude (1962), however, de- 
monstrate that observers are perfectly 
capable of performing addition and 
subtraction of subjective magnitudes 
consistently with ratio estimation. 
Obviously, it is the task of category 
rating that the observer conceives of 
in a way that is inconsistent with the 
task of ratio estimation, probably 
regarding it as a task in discrim- 
ination. 

Attneave (1962) analyzes another 
possibility of resolving the discrep- 
ancy between magnitude and category 
scales. According to his suggestion 
the category (equal interval) scale is 
the true “genotypical” scale. The 
magnitude scale is obtained essentially 
through the matching of sensations to 
numbers, and if the psychophysical 
function for numbers is nonlinear then 
the phenotypical function obtained 
by, e.g., magnitude estimation cannot 
agree with the genotypical function, 
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He derives about the same exponent, 
0.4, for the number continuum from 
a comparison of category and magni- 
tude scales for weight and loudness. 
However, it is hard to see how the 
number continuum could become 
linear when used with metathetic 
continua. For such continua the two 
types of scales agree (Stevens, 1957), 
a fact that would follow from the 
Fechnerian model and the constancy 
of discriminability in these continua. 

According to my own view, the 
category scale is a real Fechnerian 
discrimination, or, perhaps better, an 
equiprecision scale (Helm, Messick, & 
Tucker, 1961). Its deviation from the 
log function is caused by a correspond- 
ing deviation of the Weber function 
from Weber’s law. Its deviation from 
the magnitude scale is to be expected, 
since discrimination changes with 
magnitud (The consequences of 
this statement for the prothetic- 
metathetic dichotomy of perceptual 
continua is dealt with in Eisler, 1962b, 
1963.) However, the arguments used 
by Fechner in deriving sensation 
scales from stimulus scales and jnd’s 
are based on the postulated equality 
of sensation jnd’s, and the adequacy 
of this postulate can be questioned. 
This is a decision for the theory 
builder. In the category rating 
situation, our observer constructs a 
scale whose units consist of measures 
of his uncertainty, not of units of his 
magnitude scale. And there is nothing 
the experimenter can do about it— 
the customer is always right. 
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MEDIATING PROCESSES IN HUMANS AT THE OUTSET 
OF DISCRIMINATION LEARNING! 


MARVIN LEVINE 


Indiana University 


An earlier hypothesis model was extended to describe the set of mediat- 
ing processes (Hs) employed by the adult human S. The model now 
deals with behavior during 2 types of discrimination problems: Out- 
come problems (e.g., E says “right” or “wrong” after each response) 
and Nonoutcome problems (E says nothing). 2 classes of Hs are 
stipulated: Predictions (determinants of attempts to maximize re- 
wards) and Response-sets (determinants of systematic, nonsolution 
behavior). Analysis of data from 2 experiments show: (a) the model 
closely predicts behavior during Nonoutcome problems from behavior 
during Outcome problems, (b) the Nonoutcome problem may be used 
as a probe to determine the H an individual S is holding on a particular 
problem, and (c) the learning-set function can be predicted from 


knowledge of the Hs. 


The conception central to the model 
to be delineated has been suggested 
in a wide variety of sources (Bruner, 
1951; Hovland, 1952; Kendler & 
Kendler, 1962; Restle, 1962—to name 
but a few recent ones). The concep- 
tion is that the adult human starts 
a problem with a mediating process 
(“‘prediction,” “hypothesis,” ‘‘expect- 
ancy,” “set” are alternative terms 
which have been employed) and that 
this mediating process affects the 
overt behavior in specifiable ways. 
The model will describe the set of such 
processes available at the outset of 
short discrimination problems, will 
lead to techniques for evaluating 
their frequency, and will permit the 
deduction of novel theorems about 
behavior. Later sections of this paper 
will present experimental tests of 
these theorems. 

The present formulation is a de- 
velopment from a model (Levine, 
1959) of the behavior of monkeys 

1 The research was supported in part by a 
grant from the Indiana University graduate 
school. The author also gratefully acknowl- 
edges the assistance of Zeynep Man and 
Martin Richter with Experiments I and II, 
respectively. 


engaged in a series of three-trial 
discrimination problems. The funda- 
mental assumption in that model was 
that there is a set of Hs (short for 
hypotheses, after Krechevsky, 1932), 
defined as systematic response pat- 
terns, one of which is chosen by the 
subject on each problem. Examples 
of Hs are “Position Preference” 
(defined as repeated response to one 
position), ‘Stimulus Preference” (de- 
fined as repeated response to one 
stimulus), and ‘“Win-stay-Lose-shift 
with respect to the stimulus” (defined 
as response to the stimulus correct 
on the preceding trial). That model 
yielded a method of analysis for 
determining the frequency of occur- 
rence of the various Hs. It became 
possible to conclude, for example, that 
Position Preference occurred on 18% 
of the problems and to decompose the 
learning-set function into the under- 
lying set of H functions. 

The same assumption, that there is 
a set of Hs among which the subject 
chooses, and the same general method 
of analysis will be applied to adult 
human behavior over a series of 
discrimination learning problems. 
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The hypothesis, then, will continue to 
be the basic dependent variable, and 
will continue to be symbolized by H. 
There will be, however, a shift in the 
definition of this symbol. Whereas it 
had been previously defined as a 
response pattern, it will hereafter 
be defined as the determinant of a 
response pattern, as a mediating 
process which results in the particular 
response pattern. The rationale for 
such a change will become clear as the 
model is presented and will be dis- 
cussed following the presentation of 
the experiments. 

The human experimental situation 
to which the analysis will be applied 
incorporates the general procedures 
of the learning-set experiment (Har- 
low, 1959). In the most usual form 
of that experiment two stimulus 
objects are presented simultaneously 
for a few trials. Typically, the 
stimuli are relevant (response to one 
of them is consistently rewarded) for 
these few trials; their positions, which 
reverse on 50% of the trials, are 
irrelevant. A series of problems are 
presented, new stimuli being used for 
each problem. 

The present experiments will differ 
from the previous experiments in one 
important detail: the experimenter 
will have two procedures instead of 
one. In the typical discrimination 
learning experiments with monkeys 
the experimenter uses a single pro- 
cedure: one of the two responses is 
always rewarded, the other not. The 
analogous procedure with humans is 
that one of the responses is followed 
by the word “right,” the other by the 
word “wrong.” With humans, the 
experimenter may have a second 
procedure: “blank” trials may be 
presented, i.e., during some problems 
the experimenter may say nothing. 

There are then two types of prob- 
lems: Outcome problems (the ex- 


perimenter says “right” or “wrong” 
after each response) and Nonoutcome 
problems (the experimenter says noth- 
ing after each response). The subject 
will have been instructed that both 
types of problems would occur and 
that he is to try to obtain 100% 
correct in either case. A major aim 
of the model will be the prediction of 
behavior during Nonoutcome prob- 
lems from behavior during Outcome 
problems, and vice versa. 

The model will be developed to 
specify the Hs with which the subjects 
start any problem. The description 
of subsequent Hs involves arbitrary 
assumptions about resampling of Hs 
and will not be considered here. Asa 
result, behavioral data only through 
the first two trials of Outcome prob- 
lems will be considered. The initial 
experiment to which the model will be 
applied will consist of precisely two- 
trial problems. The Hs at the outset 
of longer problems will be considered 
in the second experiment. 

A two-trial problem, the subject’s 
behavior, and the experimenter’s be- 
havior will be summarized by a few 
symbols. If the two stimuli maintain 
the same positions on Trials 1 and 2 
the problem type will be described 
as “A” ; if they reverse positions from 
Trials 1 to 2 the problem type will be 
described as “B.” The response 
sequence will be described as J, if the 
subject chooses the identical stimulus 
on Trial 2 to the one chosen on 
Trial 1. Choice of one then the other 
stimulus will be described as Os. From 
knowledge of the problem type (A or 
B) and of the subject’s sequence of 
stimulus selection (I, or O,) one may 
deduce his sequence of position selec- 
tions. Nevertheless, it will be useful 
to symbolize also the sequence of 
position responses. Response to the 
same position for the two trials will 
be described as Ip; response to one 
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then the other position will be de- 
scribed as Op. The symbols +:, and 
—,, will be used to denote that the 
experimenter said “right” or “wrong,” 
respectively, following the response on 
Trial 1. The symbol pair +1 J, means 
that the subject chose a stimulus on 
‘Trial 1, the experimenter said “right” 
and the subject chose the same stimu- 
lus on Trial 2. The symbol pairs 
+10,, —1J,, and —10, are analog- 
ously defined. 

The problem type (A or B) and 
these four symbol pairs describe all 
possible relevant outcome-response 
sequences for a single Outcome prob- 
lem. On a single Nonoutcome prob- 
lem the problem type and the symbols 
I, and O, describe all possible response 
sequences. 


DISCRIMINATION MODEL 


As already indicated, the basic 
conception will be that during any one 
problem the behavior of a subject is 
determined by an H, defined as a 
determinant of systematic responding, 
and to be interpreted as a prediction 
by the subject or his set. The Hs 
available to a subject in a two-trial 
simultaneous discrimination problem 
will now be considered. In this 
situation Hs have three character- 
istics. The first is that an H may be 
contingent either on the stimuli or on 
the positions. For example, a subject 
may Predict that one of the two 
stimuli is always correct (the subject 
Predicts that one of the stimuli is 
correct and will repeat) or that one 
of the two positions is always correct 
(the subject Predicts that one of the 
positions is correct and will repeat).? 
There are then two classes of Hs, half 
directed toward stimuli, half toward 


2 The word predict will have two usages: 
as subject’s mediating process, or as the usual 
outcome of theoretical analysis. To help 
distinguish the two usages, the former will be 
capitalized. 
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positions. The second feature of Hs 
is that they are sequence oriented. 
For example, a subject may Predict 
that one of the two stimuli is always 
correct (the subject Predicts that one 
of the stimuli is correct and will 
repeat) or that the correct stimulus 
changes from Trial 1 to Trial 2 (the 
subject Predicts that the correct 
stimulus will alternate). 

The stimulus-position breakdown 
combined with the repeat-alternate 
breakdown yields four Hs. Con- 
sideration of the third feature of Hs 
will increase the number to eight. 
In all the examples just given it was 
assumed that the subject Predicts an 
event sequence for the first two trials. 
It is unrealistic to assume that this 
process is the only source of systematic 
behavior. It is conceivable that 
systematic response to a position or a 
stimulus may occur in other ways. 
Consider the following examples: The 
subject does not care about this 
experiment and wants to get out as 
fast as he can. He decides to choose 
always the stimulus on the left side; 
one of the two stimuli is repulsive to 
the subject and he will always choose 
the other even though the experi- 
menter always says “wrong” ; the sub- 
ject decides to alternate stimuli (or 
positions) for want of anything better 
to do. 

It is clear that these sets will yield 
systematic behavior, but it is equally 
clear that these processes are different 
from Predictions about correct events. 
One may also surmise that the result- 
ing response patterns should be differ- 
ent from the patterns produced by 
Predictions. The distinction will be 
made, therefore, between Prediction 
Hs and Response-set Hs, the latter 
term referring to Hs of the sort 
described in the last three examples. 

The three two-way classifications 
(stimulus, position; perseveration, al- 
ternation; Prediction, Response-set) 


—- o oa 


MepiaTInG Processes ix HuMANS 257 


lead to eight Hs, each of which may 
now be described. In order, however, 
to relate each H to behavior it is 
necessary to introduce two postulates, 
one describing behavior if a Prediction 
H exists, one if a Response-set H 
exists. 

Prediction postulate: If a subject 
Predicts how a series of rewards will 
occur he behaves so that if the Pre- 
diction were correct rewards would 
be maximized. 

Suppose, for example, that a subject 
Predicts that one of. the stimuli is 
correct and repeats, i.e., will be correct 
on both trials of the problem. If his 
Prediction is correct he can insure a 
“right” on Trial 2 of an Outcome 
Problem by following this rule: If the 
experimenter says “right” after Trial 
1 choose the same stimulus on Trial 2; 
if the experimenter says “wrong” after 
Trial 1 choose the other stimulus on 
Trial 2. Thus, on Outcome Problems 
the Prediction that one of the stimuli 
is correct and repeats will lead to the 
behavior pattern formerly described 
(Levine, 1959) as ‘‘Win-stay-Lose- 
shift with respect to the stimulus.” 

On a Nonoutcome problem the 
Prediction postulate means simply 
that the subject strives to obtain 
100% correct. A subject with the 
Prediction that one of the stimuli is 
correct and repeats would choose the 
same stimulus for all trials of the 
problem. No other behavior pattern 
can, if this Prediction is correct, 
yield 100% correct. 

‘The four Prediction Hs are de- 
scribed below. The behavior mani- 
festations are given for Outcome 
problems and for Nonoutcome prob- 
lems, followed by the summary form 
in parenthesis: 

Hypothesis a (Ha): The subject Pre- 
dicts that one of the stimuli is correct 
and will repeat. Outcome-problem 
behavior: The subject shows “Win- 
stay-Lose-shift with respect to the 


stimulus” (+,/, or —:0,). Non- 
outcome-problem behavior: The sub- 
ject chooses the same stimulus on 
Trials 1 and 2 (J,). 

Hypothesis a’ (Haw): The subject 
Predicts that the correct stimulus will 
alternate. Outcome-problem be- 
havior: The subject shows ‘“Win- 
shift-Lose-stay with respect to the 
stimulus” (+,0, or —,J,). Non- 
outcome-problem behavior: The sub- 
ject chooses one stimulus on Trial 1 
and the other stimulus on Trial 2 
(0,). 

Hypothesis 6 (Hy): The subject Pre- 
dicts that one of the positions is correct 
and will repeat. Outcome-problem 
behavior: The subject shows “Win- 
stay-Lose-shift with respect to posi- 
tion” (+1:J, or —10,). Nonout- 
come-problem behavior: The subject 
chooses the same position on Trials 
1 and 2 (J,). 

Hypothesis b’ (Hy): The subject 
Predicts that the correct position will 
alternate. Outcome-problem behav- 
ior: The subject shows ‘Win-shift- 
Lose-stay with respect to position” 
(+10, or —ıI,). Nonoutcome- 
problem behavior: The subject 
chooses one position on Trial 1 and 
the other position on Trial 2 (O,). 

Response-set postulate: If a subject 
has a Response-set H the behavior 
has the pattern described by that H 
and is independent of outcomes. 

Suppose, for example, that the 
subject has the set that he will always 
choose the stimulus on the left. He 
will manifest a sequence of responses 
to the left side, no matter what kind 
of outcomes the experimenter presents. 

The four Response-set Hs are 
described below. The behavior mani- 
festation, which is the same for both 
Outcome and Nonoutcome problems, 
is given for each H, followed by the 
summary form in parenthesis. 

Hypothesis « (Hz): The subject has 
a Response-set to repeat the same 
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stimulus. Behavior: Stimulus Prefer- 
ence, i.e., the subject chooses the same 
stimulus on Trials 1 and 2 (Z). 

Hypothesis x’ (Hz): The subject has 
a Response-set to alternate stimuli. 
Behavior: The subject chooses one 
stimulus on Trial 1 and the other 
stimulus on Trial 2 (0,). 

Hypothesis y (H,): The subject has 
a Response-sel to repeat the same posi- 
tion. Behavior: Position Preference, 
ie. the subject chooses the same 
position on Trials 1 and 2 (J). 

Hypothesis y’ (Hy): The subject has 
a Response-set to alternate positions. 
Behavior: The subject chooses one 
position on Trial 1 and the other 
position on Trial 2 (0p). 

A detail to note is that the Hs have 
been paired off according to the words 
“repeat” and “alternate.” Any such 
pair will be described as complemen- 
tary, because both members of the 
pair dovetail in their manifestations, 
yielding all possible behavior patterns. 
This relation is denoted by employing 
for a given pair the same subscript 
letter and distinguishing the two by 
primes (for example, Ha and Hw). 
The effect of complementarity will 
become explicit when the evaluation 
of the H strengths is discussed. 


EVALUATION OF THE H PROBABILITIES 


Corresponding to each H will be a 
probability denoting the theoretical 
proportion of times that the H is 
selected. The subscript letter will be 
employed to represent the probability, 
ie., P(H.) = a, P(Ha) = a’, etc. 

The assumption will be made that 
the Hs are mutually exclusive and 
exhaustive, i.e., that only one of the 
Hs occurs to a given subject on a 
given problem. This assumption 
permits the following statement: 


ata+o4+0)4+x4%' 
+y+y' = 1.00. [1] 
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Given this additional assumption, a 
technique is available (Levine, 1959) 
for evaluating H probabilities from 
Outcome problem data. A set of 
equations is derived in which these 
probabilities are expressed as func- 
tions of the obtained frequencies of 
the outcome-response sequences. Ap- 
pendix A presents this technique, in 
more general form than previously, 
as part of the attempt to solve for a, 
a’, b,---y’. It is shown there that in 
the present case a complete solution 
is not possible, that there is available 
only a relative solution which de- 
scribes by how much one of a com- 
plementary pair of Hs exceeds the 
other. That is, one may solve only 
for: 
D-a- a; D,=b—0'; 
DEEE Dy = y — 9: 
Any of the D; may range from —1.0 
(if only one H involving alternation 
is employed) to +1.0 (if only one H 
involving perseveration is employed). 
For example, Da = +1.0 means that, 
for the block of problems considered, 
the subjects always have Ha, the 
Prediction that the correct stimulus 
perseverates. Da =0 means that 
a =a'. They may both be 0 or as 
much as .5. 


PREDICTION OF RESPONSE PATTERNS 


Appendix A shows that the D; are 
obtained as numerical values from the 
Outcome problem data. It is possible 
to employ these values to predict the 
behavior of the subjects during the 
Nonoutcome problems. The theory 
may be developed to predict NZI., the 
number of times that the subject 
chooses the same stimulus for both 
trials of a problem in a block of Non- 
outcome problems. 

Appendix B presents this develop- 
ment of the theory for the condition 
when the number of A and B Non- 
outcome problems are the same. It 
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is shown there that 
NI, = (T/2)[Da + D: +1), [2] 


where T represents the total number 
of Nonoutcomes problems under con- 
sideration, D, reflects the degree to 
which the subject Predicts stimulus 
repetition (if Da is positive) or 
alternation (if Da is negative), and D; 
reflects the degree to which the 
subject has a Response-set to repeat 
or alternate stimuli (if D, is positive 
or negative, respectively). 

Embodied in Equation 2 is the 
strategy for testing the model. It is 
worth, therefore, repeating that D, 
and D, are numbers obtained from the 
Outcome problems and NZI, is the 
theoretical number of repeated stimu- 
lus selections in a block of Non- 
outcome problems. The test consists 
in comparing this theoretical number 
with the obtained number. 


EXPERIMENT | 
Method 


Subjects. 
ductory psychology courses at 
University served as subjects. 

Apparatus. The stimuli were 180 different 
three-letter nonsense syllables selected from 
Glaze’s (1928) lists. Pairs of syllables were 
randomly selected to form the stimulus ob- 
jects for a given trial, and were typed .5 inch 
apart on a 3 X 5 inch card. The same two 
syllables on a pair of cards constituted the 
materials for a problem, and a deck of 90 such 
pairs of cards constituted the materials for the 
experiment. 

Design. A learning-set experiment consist- 
ing of 90 two-trial problems was presented 
under four experimental conditions to four 
groups of 20 subjects. The four conditions 
differed along a continuum defined by the 
experimenter’s manner of reinforcement. At 
one end of this continuum is the standard 
learning-set procedure in which the experi- 
menter selects the stimulus to be correct, and 
reinforces responses only to it on each trial 
of the problem. This procedure holds for 
every problem in the experiment. That is, 
the correct stimulus is the same for both 
trials on 100% of the problems. The group 
receiving this condition will be referred to as 
G-100. At the other end of the continuum 


Eighty students from the intro- 
Indiana 


259 


is a procedure in which the stimulus desig- 
nated as correct by the experimenter alter- 
nates from Trial 1 to Trial 2, a procedure first 
utilized by Behar (1961) and designated as an 
Alternation Learning-Set. The correct stimu- 
lus is never the same for the two trials of any 
problem, or, conversely, is the same for 0% 
of the problems, The group receiving this 
condition will be referred to as G-O. In 
between the two extremes the experimenter 
may follow one procedure or the other for as 
many problems as he wishes. For the two 
remaining conditions the correct stimulus was 
the same during 80% of the problems, 
changing during the remainder (G-80), and 
the correct stimulus was the same during 20% 
of the problems, changing during the re- 
mainder (G-20). The four groups, then, were 
G-100, G-80, G-20, and G-0, where the 
number describes the percentage of problems 
on which the experimenter caused the correct 
stimulus to perseverate. 

Procedure. Each subject was shown a 
sample 3 X 5 inch card containing two syl- 
lables and was instructed that he would re- 
ceive a deck composed of similar cards and 
that he was always to choose one of the two 
syllables on each card. He was further told 
that the experimenter would say “right” or 
“‘wrong” after each choice, and that he was to 
try to be right as often as possible. He was 
then given the deck face down and was in- 
structed to turn the cards one at a time, mak- 
ing his response for each. The placement of 
cards after each trial was arranged so that 
they could not again be seen. 

After either 5 or 10 Outcome problems the 
experimenter stopped the subject and in- 
structed him that there would now be a test 
of how much had been learned thus far. The 
subject was told that during the next few 
cards the experimenter would not say any- 
thing, that because this was a test he was to 
try to get 100% correct. The subject then 
proceeded to choose syllables on the next 10 
cards (5 problems) with the experimenter 
saying nothing. After these 5 problems the 
experimenter announced that the learning 
would be resumed and presented outcomes 
during the next 10 problems. Test instruc- 
tions then followed, etc. Ten Outcome 
problems continued to alternate with 5 
Nonoutcome problems until 90 problems— 
60 with, 30 without outcomes—had been 
presented. 


Results 
Figure 1 shows the D,, the difference 


between complementary Hs, com- 
puted from all the Outcome problems 
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Fic. 1. The value of the D; computed 
from the block of 60 Outcome problems for 
each group (the groups are ordered and 
spaced according to the percentage of prob- 
lems on which the correct stimulus perse- 
verated for the two trials). 


for each group. The figure shows that 
neither Predictions about positions 
nor Response-sets show greater excess 
in perseveration or alternation, i.e., 
the D;, Da, and D, fluctuate around 
zero. The Da, on the other hand, vary 
(linearly, oddly enough) from —.47 
for G-0 (the subjects in this group 
Predict that the correct stimulus 
alternates for a minimum of 47% of 
the problems) to +.80 for G-100 
(these subjects Predict that the cor- 
rect stimulus repeats for a minimum 
of 80% of the problems). 

The values of Da and D, may be 
inserted into Equation 2 to predict 
the number of Nonoutcome problems 
in which the same syllable will be 
selected for both trials. The predic- 
tion will be made for each group, so 
that T, the total number of Non- 
outcome problems under considera- 
tion, will equal 600 (20 subjects X 30 
Nonoutcome problems per subject). 
The predicted and obtained fre- 
quencies are presented in Figure 2. 
This figure has two noteworthy fea- 
tures. One is that the obtained values 
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are exceedingly close to the predicted 
values. None of the discrepancies 
from the predictions are statistically 
significant. The other is that there 
is a bias in favor of repeating syl- 
lables. If a group had been run in 
which the experimenter caused the 
correct syllable to perseverate on 50% 
of the problems and to alternate on 
the other 50%, i.e., if the experimenter 
had said “right” and “wrong” totally 
at random, then one would predict 
(from interpolation in Figure 2, 
rather than from theoretical consid- 
erations) that the subject would re- 
peat syllables for about 360 (= 60%) 
of the Nonoutcome problems. The 


fact that with this insoluble procedure 


D, would appear to be about +.15 
(from interpolation in Figure 1) 
indicates that this repetition bias 
results primarily from the subject’s 
tendency to Predict that the correct 
stimulus repeats rather than from a 
Response-set to repeat stimuli. 

The validity of Equation 2 may be 
demonstrated in a more detailed fash- 
ion by consideration of the learning- 
set functions. These are typically 
plotted from Outcome problems as 
P(+ 2), the proportion of correct 
responses on Trial 2, at successive 
stages of the experiment. From 
considerations similar to those em- 
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all the Nonoutcome problems for each of the 
four groups. 
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ployed previously (Levine, 1959) it 
may be demonstrated that the theo- 
retical proportion correct on Trial 2 
for a given block of Outcome problems 
is given by: 


P(+:) = [1 + (24 - 1)D.1/2 [3] 


in which q is the proportion of 
problems on which the correct stimu- 
lus perseverates (for G-100, G-80, 
G-20, and G-0 the value of q is 1.0, 
.8, .2, and .0, respectively). 

If Equation 2 is solved for D, and 
the resulting expression substituted 
into Equation 3 the latter becomes: 


P(+2) = [1 + (2g = 1) 
X {(2NI,/T) — D: = 1}]/2. 


If the assumption is now made that 
D+- = 0, i.e., that the Response-set to 
repeat stimuli occurs as often as the 
Response-set to alternate stimuli, for 
any block of problems considered, 
then the last equation becomes: 


P) =1-g+ 7-1). [A] 


The symbols q and T are both de- 
fined by the experimenter, so that 
the proportion correct on Trial 2 of 
Outcome problems is given as a linear 
function of the number of repeated 
stimulus selections during Nonout- 
come problems. Thus, from Non- 
outcome problem data the conven- 
tional learning-set function can be 
predicted. 

The predicted and obtained learn- 
ing-set functions are presented in 
Figure 3. A prediction was separately 
made for each block of 20 Outcome 
problems from the 10 Nonoutcome 
problems distributed in the block 
(T = 10 problems per subject X 20 
subjects = 200 problems). The pairs 
of points are close not only by in- 
spection but also by statistical con- 
siderations. None of the differences 
between the obtained and predicted 
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Fic. 3. Obtained (solid lines) andJpre- 
dicted (dashed lines) learning-set functions 
showing the mean percentage of correct 
responses on Trial 2 during blocks of 20 
Outcome problems (the G-O curves were 
plotted on separate axes to avoid clutter). 


values are statistically significant at 
the .05 level. The largest CR, for 
problems 21-40 of G-20, is just short 
of significance (p = .052). 


THE n-DIMENSIONAL PROBLEM 


The preceding experiment demon- 
strates that for the two-trial simul- 
taneous discrimination a close relation- 
ship exists between behavior during 
Outcome problems and during Non- 
outcome problems. The model of 
mediating processes which has been 
elaborated provides a rationale for 
this relationship. In general, obtain- 
ing the H probabilities for one out- 
come condition permits a close pre- 
diction of the behavior patterns in the 
other condition. 

The model was described, however, 
only for the situation in which two 
stimuli (for example, two different 
nonsense syllables) could occupy one 
of two positions: there were two 
dimensions with two values, or cues, 
for each dimension. The remainder 
of the paper will deal with transcend- 
ing this limitation. It will be demon- 
strated how the model may be 
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generalized to obtain H strengths at 
the outset of the n dimensional learn- 
ing situation. 

The analysis for the n-dimensional 
problem will be reduced relative to 
the analysis described in the preceding 
sections for the two-dimensional prob- 
lem. In that analysis two general 
categories of Hs were described: 
Predictions (H,,--:, Hw) and Re- 
sponse-sets (H+, ++, Hy). The cate- 
gory of Predictions also had two types 
of Hs: Predictions that the correct 
value of one of the dimensions re- 
peated (H, and Hy) and Predictions 
that the correct value followed a 
sequential pattern (specifically, for 
the two-dimensional problem, alterna- 
tion: Ha and Hy). The present model 
will be “reduced” in that it will at- 
tempt to evaluate only the first type 
of Prediction, i.e., only those Hs which 
are Predictions that one of the values 
of one of the dimensions is repeatedly 
correct. Response-set Hs and Hs 
which are Predictions of more complex 
sequential events will be ignored. In 
effect the assumption will be made 
that the most important type of 
mediating process which the adult 
human subject has at the outset of a 
problem of several dimensions consists 
in an attempt to locate the cue which 
is invariantly (within that problem) 
the basis for correct responding. 

This assumption, incidentally, is 
not unique to the present treatment. 
In a common type of concept forma- 
tion experiment the n-dimensional 
problem is employed and this assump- 
tion is more or less explicit in the 
analysis (for example, Bourne & 
Restle, 1959; Brown & Archer, 1956; 
Bruner, Goodnow, & Austin, 1956; 
Grant & Curran, 1952; Hovland, 
1952). For this reason the assump- 
tion will be referred to as the concept 
formation (CF) assumption. The 
model incorporating this assumption 
will be referred to as the CF model, 
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The a priori justification for the CF 
assumption comes from three con- 
siderations: 

1. The preceding experiment showed 
that the Response-set H differences, 
D, and D,, were uniformly near zero. 
One could have assumed that the 
Response-set Hs were zero without 
seriously altering the predictions. 
Also, the outright assumption that Dz 
was zero did not seem to impair the 
quality of prediction from Nonout- 
come to Outcome problems (see 
Figure 3). 

2. There is some evidence (Good- 
now & Postman, 1955) that as the 
stimulus situation becomes multi- 
dimensional the subjects avoid com- 
plex sequence behaviors (resulting, 
for example, from single-alternation 
or double-alternation Predictions) in 
favor of locating that cue which is 
consistently to be chosen. 

3. One may, by instructions and pre- 
experimental demonstrations, mini- 
mize Predictions that events follow 
complex patterns. Several investiga- 
tors of concept formation have em- 
ployed this technique (Archer, Bourne, 
& Brown, 1955; Bruner et al., 1956; 
Hovland & Weiss, 1953; Oseas & 
Underwood, 1952). 

The CF model, then, will deal with 
the n-dimensional problem and will 
focus upon Predictions by the subject 
concerning that dimension which de- 
fines correct responding. While the 
model will be presented in general 
form for the n-dimensional problem it 
will be applied specifically to a four- 
dimensional problem. An example of 
such a problem is presented in Figure 
4. The dimensions are form (X versus 
T), position (right versus left), color 
(white versus black), and size (large 
versus small). The figure shows one 
possible sequence of the various levels 
over the four trials. This sequence 
has the special property that each 
value of each dimension appears an 
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equal number of times with every 
value of every other dimension. For 
example, T is black twice, large letters 
appear on the right twice, X is small 
twice, etc. Problems in which the 
dimensions show this kind of balance 
will be described as internally or- 
thogonal. 


THE Concert FORMATION MODEL 


As already stated, only Predictions 
that the correct cue repeats from trial 
to trial will be evaluated. For any 
given dimension there is only this one 
H. For the problem shown in Figure 
4, for example, there are four Hs 
corresponding to the four dimensions. 
If the subject has a color H at the 
outset of the problem he Predicts that 
one of the colors will be correct from 
trial to trial regardless of which form 
has that color, its size, or position. 
Similarly, if the subject has a size H 
he Predicts that one of the two sizes 
will be correct from trial to trial. 
There are also, in this problem, a posi- 
tion H and a form H. 

One task of the model will be to 
show how one may evaluate in- 
dividually the n Hs which correspond 
to the n dimensions. In addition, a 
residual H will be determined demon- 
strating the pooled effect of other 
mediating processes. For purposes of 
simplicity it will be assumed that this 
Residual H is the pooled strength of 
Predictions about dimensions which 
are not part of the formal structure 
of the problem. Such dimensions 
might be movements by the experi- 
menter, apparatus noises, a flickering 
light, etc.? 


* This is an arbitrary interpretation of the 
Residual H. Certainly other processes could 
be occurring: Response-sets, more complex 
Predictions, periodic errors, as well as Pre- 
dictions about unrecorded dimensions. The 
occurrence, if any, and contribution of each 
of these is completely unknown, nor have 
methods of disentangling them for the n 


TRIAL 


STIMULI 


Fic. 4. Four trials of a four-dimensional 
problem. 


There are, then, » +1 Hs to be 
described for the n-dimensional prob- 
lem. For the four-dimensional prob- 
lem described in Figure 4 there are 
five Hs: 

Ha: The Prediction that form is the 
correct dimension and that the correct 
form will repeat. 

H: The Prediction that position 
is the correct dimension, etc. 

H,: The Prediction that color is the 
correct dimension, etc. 

Ha: The Prediction that size is the 
correct dimension, etc. 

H,: The Prediction that a non- 
recorded dimension is correct. 

The probability of any of these Hs 
will be denoted by the corresponding 
subscript symbol. Thus, P(H,) = a, 
P(H,) = b, etc. 

The Hs are related to behavior 
solely by the Prediction postulate 
described for the two-trial problem 
(the Response-set postulate is ir- 
relevant for the CF model). The 
behavioral manifestations of the ith 
H follow directly from this postulate: 


dimensional problem been developed. The 
present assumption has the virtue of providing 
conceptual consistency and mathematical 
simplicity. 
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H;: The subject Predicts that one level 
of the ith dimension is correct and will 
repeal. 

Outcome-problem behavior : the 
subject makes his best guess as to 
which of the levels of that dimension 
is correct on Trial 1. If the response is 
correct (for example, if the experi- 
menter says “right”) the subject 
chooses the same level of the same 
dimension on Trial 2; if the response is 
incorrect (for example, if the experi- 
menter says “wrong’’?) the subject 
chooses the other level of the same 
dimension on Trial 2. This would, in 
general, be described as “Win-stay- 
Lose-shift with respect to the 7th 
dimension.” 

Nonoutcome-problem behavior : the 
subject makes his best guess as to 
which of the levels of that dimension 
is correct on Trial 1. He chooses the 
same level of that dimension on all 
subsequent trials of the problem. 

There is an important difference in 
the description of the behavior for 
Outcome and Nonoutcome problems. 
During Outcome problems behavior 
is stipulated only for the first two 
trials. This is because the effects 
of “rights” and ‘‘wrongs” from the 
second trial onward involve assump- 
tions about resampling of Hs which 
are beyond the scope of the present 
model. The limitation of relevant 
Outcome problem data to the first two 
trials makes unfeasible, in a problem 
with even as few as four dimensions, 
the solution of the H probabilities in 
the usual manner, i.e., from Outcome 
problems. The reason is, of course, 
that behavior patterns resulting from 
different Hs inevitably overlap during 
the first two trials, making evalua- 
tions of the Hs ambiguous. Over 
Trials 1 and 2 of the problem illus- 
trated in Figure 4 for example, 
behavior with respect to color would 

be identical with behavior with re- 
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spect to size, as would the behavior 
with respect to form and position. 

During Nonoutcome problems, on 
the other hand, behavior is specified 
for all trials of a problem. In the 
problem of Figure 4, a subject witha 
form H (i.e, H.) would respond 
RRLL (or LLRR; described more 
generally as AABB). The Prediction 
postulate demands precisely this be- 
havior. No other behavior could 
yield 100% correct if Ha were correct. 
Similarly, a subject with H, would 
respond AAAA; a subject with He 
would respond ABAB; a subject with 
Ha would respond ABBA. This free- 
dom to observe the manifestation of 
the initial H over several trials of a 
Nonoutcome problem permits the 
emergence of unique response pat- 
terns corresponding to each H. In 
the problem shown in Figure 4, one 
need only observe which of the four 
response patterns is occurring to 
determine which H the subject is 
holding. Thus, the H held by a single 
subject on a single (Nonoutcome) 
problem may be determined. Also, 
if one wishes to revert to probabilistic 
statements about the strengths of the 
various Hs, one need only present 
the problem to a large number of sub- 
jects. The proportion of subjects 
showing each of the response patterns 
provides probability estimates of the 
corresponding Hs. 

In an internally orthogonal four- 
trial problem such as that shown in 
Figure 4 one may also determine when 
one of the class of H, is occurring. 
Response patterns other than the 
four listed above may occur. For 
this problem, specifically, any com- 
bination of 3A’s and 1B may also 
occur. Since they could not be 
produced by the four given Hs, these 
patterns will be interpreted as reflect- 
ing H,. 

By presenting a Nonoutcome prob- 
lem of internally orthogonal structure, 
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then, one may estimate the prob- 
abilities of all the various Hs described 
by the model. The Nonoutcome 
problem may be used to determine 
the H probabilities at a given point 
in a problem series. In the experi- 
ment to be described, for example, a 
large group of subjects receives a 
series of four trial problems. At 
various points in the series the group 
is subdivided, one half receiving an 
Outcome Problem, the other half 
receiving a Nonoutcome Problem. 
From the latter condition one may 
determine the probability of the Hs 
at the outset of the problem. The 
model assumes that these are also 
estimates of the initial H probabilities 
for the subjects facing the Outcome 
problem. The validity of this as- 
sumption will be demonstrated. 


EXPERIMENT II 
Method 


Subjects. The subjects were 255 students 
from the experimental psychology courses at 
Indiana University. None of these subjects 
had participated in Experiment I. 

Apparatus. Pairs of consonants of the 
alphabet were randomly selected to provide 
the stimulus forms for a given trial. The two 
letters were printed as transparent forms on 
an opaque background on a filmstrip negative. 
The pair of letters was printed four times with 
these variations: One of the letters was large 
and one was small; one was on the left and 
one on the right; the dimensions of letter, 
size, and position were mutually orthogonal, 
i.e., each level of each dimension appeared 
twice with each level of the other two dimen- 
sions. A color dimension was added by 
randomly selecting two hues from a set of 
seven transparent dies. (The hues were 
purple, blue, green, yellow, brown, red, and 
white.) The two colors were applied to the 
four reproductions of the pairs of letters, such 
that color was orthogonal to the other three 
dimensions. This produced the set of stimuli 
for the type of problem exemplified in Figure 
4, ie, four pairs of stimuli constructed 
from four mutually orthogonal dimensions. 
Twenty-four such problems were constructed. 
With a single exception, to be noted below, 
they were all internally orthogonal prob- 
lems. There are six different types of 4-trial 


internally orthogonal problems, according 
to which dimension double alternates with 
respect to position (AABB), which single 
alternates (ABAB), and which follows an 
ABBA pattern. These six were randomly 
assigned to the 24 problems. 

The filmstrips for the 24 problems were 
spliced together so that the problems could 
be presented conveniently in sequence. A 
pair of letters was projected by an overhead 
projector (Beseler Master Vu-Graph) onto a 
screen. Even in a well-lit room the letters 
appeared brightly with the appropriate hue. 
The larger of the two letters was approxi- 
mately 6 X3 inches when projected; the 
smaller letter was half that size. The letters 
were 2 inches apart. 

Design, The subjects were divided into 
two groups of 127 and 128, respectively. 
Each group received a preliminary demonstra- 
tion problem followed by the 24 problems of 
the experiment proper. The demonstration 
problem was constructed of the four dimen- 
sions indicated but was 14 trials in length. 
The 24 problems which followed this pre- 
liminary problem consisted of 18 Outcome 
problems with 6 Nonoutcome problems inter- 
spersed. One group (Group A) received the 
Nonoutcome condition on Problems 2, 6, 10, 
14, 18, and 22; the other group (Group B) 
received the Nonoutcome condition on 
Problems 4, 8, 12, 16, 20, and 24. This design 
offered two advantages. First, H strengths, 
which are derived from the Nonoutcome 
procedure, could be obtained on every other 
problem. This permits a fairly detailed 
picture of changes in H probabilities over the 
problem series. Second, while one group of 
subjects is receiving a Nonoutcome problem 
(for example, Group B on Problem 20) the 
other group, which has had an almost 
identical history of problem solving ex- 
perience in the experiment, is receiving an 
Outcome problem constructed of the same 
stimulus materials. The model permits 
prediction about behavior in the latter condi- 
tion from the H information obtained from 
the former. 

The latter feature provided the rationale 
for the one deviation from internally orthog- 
onal problems during the experiment proper. 
The group receiving an Outcome problem 
when, at the same point, the other group was 
receiving a Nonoutcome problem did not 
necessarily receive an internally orthogonal 
problem. On these problems either zero, one, 
two, or all three of the remaining dimensions 
were confounded with the correct dimension. 
For example, if one dimension (for example, 
size) were confounded with the correct 
dimension (for example, color: green versus 
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blue) then all blue letters would always be 
large (or always small). If all four dimen- 
sions were confounded, then the same letter 
would be large, blue, and on the left side on 
every trial of the problem. This variation 
in confounding expanded the range over 
which predictions could be made. 

Procedure. The 255 subjects were run in 10 
classes of approximately 25 students each. 
The first pair of stimuli from the preliminary 
problem was projected on the screen before 
the class while the initial instructions were 
read. The subjects were told that they were 
to decide which of the two stimuli was correct, 
that they were to indicate their choice by 
filling in the appropriate side (right or left 
corresponding to the location of the stimulus 
chosen) by the first answer space of an IBM 
answer sheet and that the experimenter would 
then indicate which stimulus was correct. 
They were further told that there would be a 
series of such stimuli and that this procedure 
was to be followed on each presentation. 
After all the subjects made their first response 
the experimenter pointed to the correct 
stimulus. Following this outcome presenta- 
tion the next pair of stimuli appeared. The 
subjects responded and the experimenter 
again pointed to the correct stimulus. 
Fourteen trials took place in this manner. 
The forms for these preliminary trials were 
always the letters A and E, the colors were 
red and green, and the letters were of two 
sizes as described above. The experimenter 
always pointed to the larger letter on all 
trials for all subjects. 

When the 14 trials were ended, the experi- 
menter announced that the large letter was 
always correct and told the class that this was 
a demonstration problem. He then ex- 
plicitly described the four dimensions (large 
or small, right or left, the two colors, and the 
two letters) and stated that as in the pre- 
liminary problem where the large stimulus 
was always correct one of these cues would 
always provide the correct basis for re- 
sponding. 

The experiment proper was then begun. 
Outcomes were always presented except when 
Nonoutcome problems were scheduled. Dur- 
ing all Outcome problems the experimenter 
pointed to the correct stimulus after each 
trial. For Outcome Problems 1-12 one of the 
two colors always served as the basis for 
correct responding; on Problems 13-24 one 
of the two letters always served as the basis 
for correct responding. Thus, there were two 
concept formation learning sets: A color set 
followed by a form set. The second followed 
the first without any special announcement 

or break. 


LEVINE 


Before the first Nonoutcome problem 
(Problem 2 for Group A; Problem 4 for 
Group B) the experimenter announced that 
the next problem would be a test of how 
much had been learned thus far. The class 
was told that during the next problem the 
experimenter would not point to the correct 
stimulus after each trial, that because this 
was a test the students were to continue to 
try to get 100% correct. The next four 
trials followed without outcomes. These 
trials were followed by the next three Out- 
come problems. Test instructions were then 
given again (before Problem 6 for Group A; 
Problem 8 for Group B) followed by another 
Nonoutcome problem. Three Outcome prob- 
lems continued to alternate with one Non- 
outcome problem until all 24 problems had 
been presented. 


Results 


The H probabilities on every second 
problem are presented in Figure 5. 
These probabilities are directly equiv- 
alent to the proportion of subjects 
manifesting the response pattern cor- 
responding to the H on each Non- 
outcome problem. Thus the increase 
in c over Nonoutcome Problems 2-12 
means that an increasing number of 
subjects are showing the response 
pattern corresponding to the color H 
on these problems. The figure shows 
how the probabilities of each H 
change, first during a color learning- 
set series (Problems 1-12) then during 
a form series (Problems 13-24). 

It is important to recall that each 
data point represented in this figure 
comes from a Nonoutcome problem, 
that the curves represent changes 
in response patterns recorded during 
problems when the experimenter is 
saying nothing. Furthermore, the 
points represent two different groups 
alternating on Nonoutcome problems. 
In spite of this unorthodox style of 
collecting data, the resulting curves 
are quite regular; the curves for ¢ 
during Problems 2-12 and for a during 
Problems 14-24 appear very much 
like typical, i.e., Outcome problem, 
learning curves, and the curves for 
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Fic. 5. The probability of occurrence of the various Hs (the open circles 
are for Group A; the filled circles are for Group B). 


other Hs show regular, gradual ex- 
tinction effects. 

Certain details in this graph are 
noteworthy. The first is that M 
scarcely ever occurs. These subjects 
almost never Predict that position is 
the basis for solution. The reasons 
for this effect are unclear. It may 
occur either because the adult human 
subject generally avoids response to 
position in such a problem or because 
of the particular procedures employed 
in this experiment. What is im- 
portant, however, is that the absence 
of consistent response to position 
means not only that 6 = 0 but that 
there is no Response-set to position, 
i.e., there is no manifestation of posi- 
tion preference. This represents fur- 
ther validation of the assumption 
that Response-sets are negligible with 
this type of subject. 

The second detail is that when the 
learning-set changes there tends to be 
a small increase in the strength of 


already extinguished Hs; i.e., when 
the learning set shifts from color to 
letter, b, d, and R show a small incre- 
ment in strength. Restle (1958) sug- 
gested that this might happen. The 
method provides a technique for 
quantitative analysis of the effect. 
Third, more proficiency is achieved 
on the first learning set than on the 
second (cat maximum is .84; a reaches 
only .60). It is possible that color is 
more salient than letter, but again, 
the particular procedures must be 
considered. The letter problems came 
in the second half of a massed, and 
probably tedious, procedure. ‘There 
was evidence that the subjects were 
tiring over the experimental session. 
One could have demonstrated this, 
although it was not done, by recording 
the number of yawns over the experi- 
ment. They clearly increased. It is 
possible, then, that the depression in 
the second learning-set function as 
well as some of the irregularities found 
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in the second half of the data might 
disappear with a more distributed 
procedure. 

Figure 5 portrays the information 
to be gathered by going beyond mere 
learning representations of data. Of 
course, this figure describes the H 
strengths from Nonoutcome problems. 
It is necessary now to consider its 
relevance to what traditionally has 
been of more universal interest, the 
Outcome problem. 

During any even-numbered prob- 
lem of Experiment II half the subjects 
received an Outcome problem while 
the other half received a similar 
problem without outcomes. Since 
the model deals with Hs at the outset 
of the problem one may assume, as 
has been done throughout, that the 
probabilities are the same for both 
types of problems. Therefore, if the 
probabilities are known for one type 
of problem they are, in theory, also 
known for the other. In theory, 
Figure 5, although derived from Non- 
outcome problems, also describes the 
H probabilities at the outset of 
the corresponding Outcome problems. 
This assertion may be tested by using 
the H probabilities from each Non- 
outcome problem to predict percent- 
age correct on Trial 2 of each corre- 
sponding Outcome problem. From 
the data in Figure 5, in effect, the 
model can predict the traditional 
learning-set function: percentage cor- 
rect on Trial 2 of successive problems. 

The argument is straightforward : 

1. On any Outcome problem one of 
the dimensions, and, therefore, one of 
the Hs is always designated as correct. 
The probability of the correct H (as 
inferred from the Nonoutcome prob- 
lem) provides an estimate of the 
proportion of the subjects on the 
corresponding Outcome problem who, 
because they follow this H, would be 
correct on Trial 2. For example, on 
Problem 2 P(H.) = .32. This means 


Marvin LEVINE 


that 32% of the subjects facing the 
Outcome condition on Problem 2 will 
make a correct response on Trial 2. 

2. Zero to three dimensions may be 
confounded with the correct dimen- 
sion during Trials 1 and 2 of the 
Outcome problem. Suppose, for the 
problem described in Figure 4, that 
color were the correct dimension. 
One dimension, size, is confounded 
with it over Trials 1 and 2. The 
probability of an H corresponding to 
such a confounded dimension provides 
an estimate of additional subjects who 
would be correct on Trial 2 of the 
corresponding Outcome problem. On 
Outcome Problem 2 also, size is con- 
founded with color. P(Ha) = .25, sO 
that in addition to the 32% men- 
tioned above another 25% would be 
correct on Trial 2. 

3. All subjects holding an H about 
a dimension unconfounded with the 
correct dimension over Trials 1 and 2 
(form and position in the example of 
Figure 4 and on the second Outcome 
problem) would, because of the “Win- 
stay-Lose-shift” character of the be- 
havior required by the Prediction 
postulate, make an incorrect response 
on Trial 2. 

4. For the proportion of subjects 
holding the Residual H it will be 
assumed that half would be correct on 
Trial 2 and half would be incorrect. 
Since P(H,) = .18 on the second 
problem, an additional 9% of the 
subjects would be correct on Trial 
2 of the corresponding Outcome 
problem. 

Thus, for Problem 2, percentage 
correct on Trial 2 is given as: 


% Correct = P(+:2) X 100 
= (.32 + .25 + .09) X 100 
= 66%. 
In general, 


P(t) = PU) + 2P Hy) + 2 
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where P(H,) is the probability of the 
correct H, and 2P(H,) is the sum of 
the probabilities of the Hs correspond- 
ing to dimensions confounded with 
the correct dimension over Trials 1 
and 2. 

Table 1 shows the dimensions which 
were confounded with the correct 
dimension during Trials 1 and 2 of the 
Outcome problems. It also shows in 
the right-hand column, the resulting 
formulas for determining the theo- 
retical P(+:) for each problem. 
From the information in Figure 5 one 
may obtain the numerical theoretical 
values. These have been obtained as 
percentages and are compared to the 
actual percentage correct in Figure 6. 

Consider first the solid line. This 
connects the empirical percentage 
correct values, i.e., 100(#right)/N. 
For all its irregularities it is a learning- 
set function, by definition (Percent- 
age correct on Trial 2 over successive 
problems—see Harlow, 1959). The 
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irregularities result, one should note, 
not from careless planning or sloppy 
procedure but rather, in large measure, 
from the deliberate confounding of 
dimensions. It was the task of the 
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Fic. 6. The obtained (solid line) and predicted (dashed line) curves 
of the percentage correct on Trial 2 of successive Outcome problems. 
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model to reproduce these irregu- 
larities. 


The theoretical values are repre- 
sented by the dashed line. The 
irregularities are well reproduced. 
Every change in direction of the curve 
is predicted. The H data, then, from 
the Nonoutcome problems permit 
generally good predictions of the 
performance at the outset of the Out- 
come problems. In this sense, the 
assertion that the H values portrayed 
in Figure 5 describe the values had 
an Outcome problem been presented 
receives some verification. A word of 
caution is needed, however. Three of 
the theoretical points (specifically at 
Problems 10, 22, and 24) are signifi- 
cantly different (p < .05 by x? test) 
from the obtained points. The source 
of these differences, whether the 
model, or the procedure, is unknown. 


DISCUSSION 


A model has been presented which 
views the adult human subject as 
selecting, at the outset of a dis- 
crimination problem, one H from a 
set of Hs, where the H is defined as a 
mediating process. Within this gen- 
eral conception two variations of the 
model were considered. In the first 
the situation was restricted to the 
two-dimensional discrimination situa- 
tion, but little restriction was im- 
posed upon the type of H which the 
subject might hold. This permitted 
the detailed description of two classes 
of Hs: Predictions and Response-sets. 
An experiment consisting of two-trial 
Outcome and Nonoutcome problems 
effectively demonstrated that for the 
two-dimensional situation the model 
accounts for the relationship between 
these two types of problems. The 
results also suggested that Response- 
set Hs were not occurring. 

The latter finding was employed in 
the second variation of the model. 
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The omission of Response-set Hs from 
consideration helped simplify the 
model for application to the n dimen- 
sional problem. The application was 
restricted to the concept formation 
situation, i.e., to the setting in which 
it was assumed that the subject was 
attempting to locate that cue which 
provided the basis for correct respond- 
ing. This variation permitted the 
specification of the H held by a single 
subject on any particular problem as 
well as statements about the proba- 
bility of the Hs. A technique was 
demonstrated for validating the prob- 
ability statements. 

. The general model has some special 


features and has uncovered some 
unique results which are worth 
stressing. 


1. The H, rather than the specific 
choice response on a particular trial, is 
regarded as the dependent variable, 
i.e., as the unit of behavior affected 
by the reinforcements. This point of 
view has several advantages. First, as 
noted in the analysis of H behavior by 
monkeys (Levine, 1959), the learning- 
set effect can be treated within the 
context of a conditioning theory. In 
the typical learning-set experiment 
the Prediction that one of the objects 
is correct and will repeat is the only 
H which receives 100% reinforcement 
beyond Trial 1. This feature produces 
the increase in ‘‘Win-stay-Lose-shift 
with respect to the object” relative 
to the other Hs. 

Second, the paradox of alternation 
learning is eliminated. This paradox 
derives from the traditional view of 
reinforcement as increasing the proba- 
bility of the last-made choice response. 
Consider a subject in Group G-0 in 
Experiment I. He responds by choos- 
ing one of two stimuli on Trial 1 of 
any problem. On half of the prob- 
lems the experimenter will then say 
“right.” This outcome is virtually 
universally regarded as a reinforce- 
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ment of the response. But does it 
have this function? The subject 
typically chooses the other stimulus 
on the next trial. Therefore, the 
experimenter has reinforced one re- 
sponse yet has increased the proba- 
bility that the subject will make the 
other response—a puzzling result by 
most definitions of the term “rein- 
forcement.” The resolution of this 
paradox by the present model is that 
the behavior affected by the rein- 
forcements is not the choice response 
but the H selected, i.e., the Prediction 
of how events will proceed. In this 
particular example a series of out- 
comes reinforces the Prediction that 
the correct stimulus alternates. 

Third, Experiment II makes it clear 
that learning may be occurring with- 
out being manifested in the record of 
correct choice response, i.e., in the 
traditional learning curve. The ob- 
tained percentage correct on Trial 2, 
represented by the solid line in Figure 
6, shows no clear systematic increase 
over problems. One must analyze the 
changes in H strengths, as was done 
in Figure 5, in order to see the 
learning. 

2. The definition of the H has been 
shifted from a behavior pattern to a 
mediating process of which the be- 
havior pattern is a manifestation. 
The former definition was employed 
earlier in the model for monkeys. It 
was possible to define the H in this 
way because the experimenter utilized 
only a single procedure (presenting 
outcomes) resulting, with a given H, 
in a single behavior pattern. It would 
have been superfluous to assert that 
an H represented a Response-set 
or Prediction which resulted in the 
behavior pattern. The change was 
necessitated because the experimenter 
now employed two procedures: pre- 
senting and withholding outcomes. 
The behavior patterns defining an H 
previously (for Outcome problems) 


must, for some Hs, necessarily change 
when outcomes are withdrawn. For 
example, a subject with a strong “Win- 
stay-Lose-shift”’ habit must change his 
behavior during Nonoutcome prob- 
lems, since “Win” and “Lose” are not 
available as stimuli for staying and 
shifting. By transferring the defini- 
tion of the H from the response 
pattern itself to the determinants of 
the response pattern, i.e, to the 
mediating process, one was able to 
select intuitively reasonable postu- 
lates which permitted the specification 
of the change in the response pattern 
when outcomes were withdrawn. 

3. A specific relationship exists be- 
tween Outcome and Nonoutcome 
problems. This relationship is pre- 
dictable from the model, in part 
because of the assumption that the 
same set of Hs determines behavior 
in both types of problems. This 
relationship was differently employed 
in the two experiments reported 
above. Experiment I simply demon- 
strated that the results predicted by 
the model in fact occurred. It 
demonstrated this for four differently 
treated groups receiving two-trials- 
per-problem learning sets. 

Experiment II employed the rela- 
tionship to determine Hs in a problem 
series from the Nonoutcome problems. 
The Nonoutcome problem became a 
“probe” to determine, for each in- 
dividual subject, the H he was 
holding at the time. 

4. A distinction is explicitly made 
between Predictions (manifested, 
during Outcome problems, in be- 
havior contingent upon outcomes) and 
Response-sets (manifested in behavior 
which is always independent of out- 
comes). Hypothesis analyses up until 
now have overlooked the distinction. 
Krechevsky (1932), for example, de- 
monstrated Response-sets (position 
preference, stimulus preference) in 
rats but wrote as though the behavior 
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were manifestations of Predictions. 
He characterized hypotheses as 
follows: 


an “hypothesis” is something that must be 
verified before it is persisted in. If the hy- 
pothesis does not lead to certain expected 
results it is soon dropped. “Jf this attempt 
is correct, then I should get such and such 
results, if I do not get such and such results 
then I must change my behavior.” . . . The 
rat, in the maze or in the discrimination box, 
behaves in the very same way [pp. 529-530]. 


Note the dependence stipulated 
upon “results’’ (outcomes) as de- 
terminers of the response patterns. 
In fact, however, Krechevsky pre- 
sented data only for behavior patterns 
which persist regardless of the out- 
comes, patterns which were Response- 
set manifestations. 

This interpretation by Krechevsky 
caused Spence (1940) to retort 


hypotheses are far from what he [the writer, 
Spence] understands by the terms insightful 
and intelligent. Only persistent non-adapta- 
tive responses can attain the distinction of 
being hypotheses—for, in order to classify as 
a hypothesis, a response, although ineffective, 
must continue to be persisted in a certain 
minimum number of times. A maladaptive 
act which is speedily (intelligently?) aban- 
doned cannot ever be a hypothesis [p. 287]. 


In effect, Krechevsky argued that 
all systematic behavior manifested an 
attempt by the subject to maximize 
his rewards; Spence’s reply was that 
the behavior was of a different order. 
In terms of the analysis presented 
here Krechevsky was inferring Pre- 
dictions from behavior patterns which 
are manifestations of Response-sets. 

5. The prevalence of Response-sets 
in animals contrasts sharply with the 
present results. As just noted, Kre- 
chevsky found Response-sets in rats. 
Schusterman (1961) demonstrated 
similar Hs in chimpanzees, and Levine 
(1959) and Harlow (1950) demon- 
strated them in monkeys. Response- 
set Hs are widespread among infra- 
human animals, According to the 
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experiments reported here, adult 
human subjects (specifically, college 
students) show no Response-set Hs.. 
Thus, one could have assumed that 
these Hs were zero in Experiment I 
without hurting the quality of pre- 
diction; the explicit assumption in 
Experiment II did not seem to impair 
the effectiveness of the model; one 
Response-set commonly observed 
among animals, position preference, 
was clearly absent in Experiment II. 

Although the model has these above 
mentioned features to recommend it, 
it does not yet stand as a compre- 
hensive theory. There are several 
problems which remain to be solved. 
First the model deals with Hs only at 
the outset of problems. This restric- 
tion was made primarily because 
there is currently no basis for making 
assumptions about the effects of out- 
comes beyond Trial 2. A technique 
for investigating these effects will be 
needed before the model can be 
generalized. Second, in going from 
the two-dimensional (Experiment I) 
to the n-dimensional (Experiment II) 
problem the class of Hs was restricted 
to Predictions that the correct cue 
would repeat from trial to trial. 
While this restriction may be satis- 
factory for well motivated college 
students it is undoubtedly not ade- 
quate for application to all subjects. 
Children, for example, would probably 
require a model which measured both 
Predictions and Response-sets in the 
n-dimensional problem. ‘Third, the 
definition of the Residual H is far 
from settled. The treatment of this 
H, that it was manifested by any 
response pattern not strictly conform- 
ing to a recorded dimension, was 
simple. The chief defect with this 
approach, however, is that it leaves 
out of account momentary sources of 
error, ‘‘slips’”’ by a subject who has the 
correct (or any other) H. Because of 
this, the Residual H, as now measured, 
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would increase artifactually with 
longer problems. It is anticipated 
that a more satisfactory treatment of 
Residual response patterns will be 
distilled as data accumulate. Finally, 
the various techniques of measuring 
mediating processes need to be refined 
and compared. Experiment II dem- 
onstrated that, for the n-dimensional 
problem, the Nonoutcome procedure 
may serve to determine the Hs. 
M. Richter, at Indiana University, is 
currently developing the model to 
measure H strengths directly from 
Outcome problems of n dimensions. 
Another technique, one which is time- 
honored but not tested, is to utilize 
verbal reports. Several researchers 
(Bruner et al., 1956; Heidbreder, 
1924; Verplanck, 1962) have had the 
subject state his hypothesis before 
each trial or certain trials. Insuffi- 
cient attempt, however, has been 
made to investigate the relationship 
between the verbal response and the 
choice responses. The three tech- 
niques need to be developed, brought 
within the framework of the model, 
and compared. 
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APPENDIX A 


The technique for evaluating the H 
strengths from Outcome-problem data 
will be presented here. The two problem 
types (A and B) and the four outcome- 
response sequences (+: Is» +1 On —1 Is 
—,0,) may be organized into a matrix of 
eight cells, in which each cell refers to one 
of the possible events which may occur. 
This matrix is presented in Table A1. 
After a set of Outcome problems has been 
presented to the subjects, the frequency 
with which each event has occurred may 
be tabulated. Each frequency will be 
referred to as ni; where the first sub- 
script denotes the problem type (or row 
of the matrix) and the second subscript 
denotes the outcome-response pattern 
(or column of the matrix). Correspond- 
ing to the frequency will be a ratio show- 
ing the proportion of times that the same 
(or other) stimulus was chosen on Trial 2, 
given the problem type and results of 
Trial 1. For example, 


Mu 
mu + Me 


pu = P(I,|A, +1) = 
Table A1 shows that the eight p; are 
similarly defined. 

The first step in evaluating the H 
probabilities is to express the p. in 
terms of a, a’,---, y’. This will be done, 
by way of example, for pı. Since, with 
this example, only Type A problems are 
considered, the A will be omitted from 
the subsequent formulas. The expres- 


sion above for p11 becomes, 


di BERILA 
pu = P(L|+:) = POUS. 
P(+, I.) may be expanded to: 
PEES) 


= P(Ha +: 1) + P(He +11.) + 
T P(Hy S I.) 
= P(I,|Ha +)P (Ha +1) 
+ P(I,| Har +1)P(Hw +1) +++: 
+ P(I,| Hy +1)P (Hy +1) 
= P(U,| Ha +1)P (Ho) P (+1) 
+ PC, | Har +1)P (Har)P(+1) +++ 
a5 P(I,| Hy +1)P (Hy) P (+1). 
The Prediction and Response-set pos- 


tulates specify P(I,|H; +1) so that the 
last equation may be rewritten: 


P(+i I.) = 1-a-P(+1) 
+ 0-a’+P(+1) FR 
+ 0-y’-P(+1) 
= P(+)[a +b +x +y] 
Therefore, 


P(I|+)=a+b+x+y, 
or 
nmi 
mi + me 


Pu=atb+x+y= 


The other p:;; may be obtained in a 
similar manner, yielding the eight fol- 


TABLE Al 


FREQUENCY (7;;) AND PROPORTION (pij) WITH WHICH EACH OF THE EIGHT 
Events May Occur IN A BLOCK OF OuTCOME-PROBLEM DATA 


Outcome-response sequence 
Type 
+l. +10, ius -10, 
my ni 113 nis 
A pu = pu = pu = pu = Nı. =2m;i 
mu + M12 mu + m2 ms + mia Mis + Mis j 
nar naz N23 No4 X 
B | pu = p2 = pn = pu = No, =2M2 
na + nz No + No» Nog + no Nox + nas j 
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lowing equations: 


Pr=atb+x+y 
pir a’ +b’ txr ty 
bu= ad +b +x+y 
pu=atbtx +y 
fn =a+b'+x+y’ 
do=a'+b+x'+y 
bu=a'+b+xt+y' 
pu=a +b +x +y. 
Because of the complementary relation 
between the repeat versus alternate pairs 


of Hs, the individual symbols may not be 
evaluated by any general technique. It 
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pair of Hs. Itis possible, for example, to 
evaluate D, = a — a’. This is obtained 
by adding pis, pis Pr, and pu, and 
applying Equation 1. 

This yields, 


Putputput pu—2 
2 


D,=a-a'= 


Nea Pu a ee 2 


PutPatpist prs—2 
2 


D,=x-x'= 


is possible, however, to evaluate the Djay-y'= Pit pot pist pu—2 
difference between a complementary N 2 z 
APPENDIX B 
The determination of NI, the theo- TABLE B1 


retical number of times that the same 
stimulus is selected in a block of Non- 
outcome problems, will be presented 
here. The two problem types (A and B) 
and the two response sequences (J, and 
0.) may be organized into a matrix of 
four cells in which each cell refers to one 
of the possible events which may occur. 
This matrix is presented in Table B1. 
From this table one may note that: 


t 
[Agha P(I,|A) = ra 
and that, 
people 
go = P(I,|B) = Ta. 


Employing the mathematical arguments 
analogous to those in Appendix A, one 
obtains: 


t 


PIJA) =a+b+et+y= x 
and 
t 
PIB) =a +b tety =a 


These equations may be rewritten as: 


tu = T (a +b +x +y) 
ta = Ta (a +b +x +y). 


FREQUENCY (fj) AND PROPORTION (gi) 
WITH WHICH EACH OF THE FOUR 
Events May OCCUR IN A BLOCK OF 

NONOUTCOME-PROBLEM DATA 


Response sequence 


I 


A gu Ti. =u Hha 


ee ee tis 
tu his tu + he 


in in 
B m- ie | & “in bin Ts, =in He 


NI, =n +i 


If the condition holds that Tı. = Ts. 
= T/2, i.e., if half the problems are of the 
A type and half are of the B type, then: 


NI, = tu + ta 


= (T/2)[2a +b +b' + 2x +y +y'] 
= (T/2)[a +x 

+(a+b+b +r +y+y)] 
= (T/2)[a -a +x- x' +1] 


or, 


NI, = (T/2)(Da + Dz + 1]. 


This last equation shows that the 
number of repeated stimulus selections 
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during Nonoutcome problems may be 
predicted once Da and D, have been 
determined from Outcome problems. 
This equation is intuitively plausible, 
since it states that NJ, will be greater the 
relatively greater is the tendency either 
to Predict that the correct syllable re- 
peats or to have a Response-set to repeat 
syllables. For this reason, this equation 
was used to predict NI.. This equation, 
however, may be reduced to a simpler 
form, one which provides an interesting 
theorem about Outcome and Nonout- 
come results. 

If the equations in Appendix A for D, 
and D, are substituted into this last 
equation for NI., this equation reduces 
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to 
NI, = Pu + pa 
Ty 2 ` 


It will be recalled that pu = P (I, |A, +1); 
pu = P(I.|B, +1). The last equation 
says, essentially, that the proportion of 
repeats during Nonoutcome problems is 
equal to the proportion of repeats during 
Outcome problems when the experimenter 
said “right” after the first response. This» 
may be stated in its most general form as 
follows: 

Theorem: When a subject is trying to 
obtain 100% correct during Nonoutcome 
problems he behaves as though the 
experimenter were saying “right” after 
each response. 
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MEASUREMENT OF VERBAL RELATEDNESS: 
AN IDIOGRAPHIC APPROACH! 
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A theoretical model was developed in which the associative meaning 
of a word was defined as the set of associates evoked by that word. 
The relatedness between 2 words was postulated to be dependent upon 
the degree of overlap of their respective associative meanings. A meas- 
ure of relatedness was described between words. 3 experiments which 
investigated the use of this measure were reported. A high positive 
correlation was found between values of the measure and subjective 
judgments of relatedness. It was also found that individual differences 
in creativity affect the stability of the measure and the appropriateness 
of particular weighting exponents used to compute the measure. 


At least three approaches to the 
problem of meaning similarity have 
been proposed in recent literature. 
Osgood, Suci, and ‘Tannenbaum 
(1957) have emphasized the non- 
verbal representational aspects of 
meaning and meaning similarity, 
where D expresses the similarity be- 
tween two words in terms of the 
distance between their semantic pro- 
files. However, this technique has 
limitations which Osgood himself real- 
izes (Osgood et al., 1957). “Many 
denotatively distinct concepts may 
occupy the same region of our seman- 
tic space, ie, may have highly 
similar profiles—‘hero’ and ‘success’ 
and ‘nurse’ and ‘sincere’ would be ex- 
amples [p. 323].’’ Thus, there must 
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be some aspect of meaning similarity 
which is not captured by the D score. 

In contrast to Osgood’s approach, 
several authors have explained simi- 
larity judgment in terms of verbal 
associative factors. Jenkins and Cofer 
(1957), Cofer (1957), and Deese (1962) 
have proposed measures which vary as 
a function of the number of identical 
associates given by a group to both 
members of a pair of words. In the 
Cofer study the subjects gave one 
association to each stimulus word. 
Response frequencies were tabulated 
and converted to percentages. For 
each different response given com- 
monly to the members of a pair the 
smaller percentage value was added 
to the smaller values of any other 
responses given commonly to the 
same pair of words. The resulting 
sum reflects the percent of the group's 
responses which the two stimulus 
words have incommon. Although no 
test of significance was employed in 
this study the measure seemed to 
relate positively to judged similarity 
as scaled by the Haagen (1949) 
technique. 

Bousfield, Whitmarsh, and Danick 
(1958) and Whitmarsh and Bousfield 
(1961) have employed a similar tech- 
nique in the successful prediction of 
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stimulus generalization. These stud- 
ies described an index which repre- 
sented the proportion of the associa- 
tional responses to one word that also 
occur as responses to another word. 
These proportions were tabulated 
from single responses given by a 
group of subjects. 

There are two apparent limitations 
to the Cofer, Deese, Bousfield ap- 
proach. By restricting the subjects 
to one association many of the idio- 
syncratic responses that are usually 
displayed near the end of an indivi- 
dual’s associative hierarchy are lost. 
This loss must be accompanied by a 
corresponding reduction in sensitivity. 
Furthermore, because these measures 
are based upon group estimates of 
similarity, they may not be applicable 
to problems connected with individual 
differences in associative structure. 

The third approach to the study of 
meaning similarity attempts to modify 
Osgood’s theory. Flavell and Johnson 

(1961) asked the subjects to give a 
signal as soon as they perceived a 
similarity between the members of 
pairs of words. This latency measure 
(L) was found to vary inversely with 
line judgments of similarity. Flavell 
(1961b) also found a positive correla- 
tion between line judgments of simi- 
larity and estimates of the probability 
that the members of pairs of words 
would “co-occur” in the same spatial 
setting. 

In the development of these two 
measures Flavell (1961a) introduces 
the idea that a full description of the 
meaning of a word ought to account 
for the nonverbal representational 
components associated with the dis- 
tinguishable attributes of both the 
referent object and the nonreferent 
objects present in the physical context 
of the referent object. He feels that 
the semantic differential (Osgood et 
al., 1957) accounts for that portion of 
meaning associated with the referent 


object, but that it does not reflect 
the representational component con- 
nected to the nonreferent objects. 
It is his contention that Z and the co- 
occurrence measure (C) account for 
this nonreferent portion of the non- 
verbal mediational correspondence 
which D is unable to tap. Either of 
the measures taken in combination 
with D should predict semantic simi- 
larity better than D alone. His data, 
however, indicate that this relation- 
ship may not exist. In addition, 
Flavell (1961a) implies that his meas- 
ures somehow account for verbal 
associational factors as well as for 
nonverbal mediational components. 

There are a number of difficulties 
with Flavell’s approach. First, it 
might be argued that Osgood’s seman- 
tic profiles account for nonreferent 
representational factors. If this is 
true the distinction between referent 
and nonreferent measures becomes 
inappropriate. Even if we accept for 
the moment that the distinction is 
legitimate, we have no guarantee that 
his measures are tapping nonverbal 
mediational components. It could be 
argued that L varies directly with the 
number of existing common associates, 
and that similarity in meaning varies 
with the number of common associ- 
ates. The fewer the number of 
associates two words have in common, 
the longer it would take an individual 
to chance on one of them. A difficulty 
with C is that it is dependent upon the 
degree to which similar things are 
thought of as coexisting. It will not 
predict accurately when two elements 
are similar but do not co-occur 
(Indian elephant—African elephant) or 
co-occur but are not similar (wrist- 
watch—man). 

Flavell and Johnson (1961) propose 
a third measure which is assumed to 
be equivalent to C and L in terms of 
its relationship to Osgood’s D. It was 
found that the number of similarities 
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perceived between members of a pair 
of words varied directly with the 
judged similarity of these words. 
This technique suffers from the same 
kind of confusion that surrounds L. 
Although it was designed to estimate 
the distance between nonverbal repre- 
sentational components connected to 
the nonreferent objects, its success 
might be explicable in terms of verbal 
associative factors. The number of 
perceived similarities may depend on 
the number of common associates 
existing within the hierarchies of the 
two stimulus words. In the process of 
attempting to integrate verbal and 
nonverbal components of meaning 
similarity, Flavell seems to lose the 
ability to distinguish between the 
effects of the two. 

Several investigators, then, have 
developed measures which quantify 
various aspects of relatedness. Al- 
though these have all been demon- 
strated to have predictive power, each 
is in some way inadequate. A more 
adequate measure of meaning simi- 
larity would distinguish between the 
effects of verbal and nonverbal factors 
and it would be applicable to the 
study of individual differences. It 
should be sensitive to the full range of 
possible types of similarity and should 
measure basic or underlying processes 
rather than derivatives of these proc- 
esses. Furthermore, it may be of 
more value to develop an explanatory 
measure which could lead to further 
insight and study than simply to con- 
struct a good predicter. 

Hopefully, this paper presents such 
a measure of verbal relatedness. The 
premise from which the measure is 
developed is that the relatedness of 
two words varies directly with the 
degree to which the associative hier- 
archies connected to those two words 
are identical. This notion finds sup- 
port in Underwood and Richardson’s 
(1956) statement that a relationship 
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exists, “only when one or more com- 
mon responses are evoked by the 
different stimuli [p. 84].” The pro- 
posed measure is based upon verbal 
associative processes alone and does 
not attempt to account for representa- 
tional factors which may influence 
judged relatedness. It is distinct 
from the techniques employed by 
Deese (1962), Cofer (1957), and Bous- 
field, Whitmarsh, and Danick (1958), 
in that it is based upon more than one 
response per subject and is applicable 
to the study of individual differences 
in verbal behavior. 

Underwood (1952) has drawn a 
distinction between the general notion 
of relatedness and specific kinds of 
relatedness. This implies that two 
words may bear any number of 
specific independent relationships to 
each other, each of which contributes 
to the total relatedness of the two 
words. For instance, while similarity 
is one kind of relationship which may 
contribute to the relatedness of two 
words, they may be related in a 
number of other ways. Similarly, 
probability of evocation in an associa- 
tion task reflects one aspect of the 
relatedness of two words. Yet prob- 
ability of evocation does not neces- 
sarily reflect total relatedness as 
evidenced by a comparison of the 
word pairs “black-white” and ‘“mut- 
ton-sheep.” The members of the 
former pair have a higher mutual 
probability of evocation; yet the 
members of the latter pair seem, in 
some sense, more related. The meas- 
ure described in this paper is quite 
distinct from other measures in that 
it is designed to capture the general 
level of verbal relatedness regardless 
of the specific kinds of relationships 
which may exist between the words 
under consideration. 

The remainder of the paper will 
take the following form. First, asso- 
ciative meaning will be defined. 
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Second, the concept of relatedness in 
terms of the intersect of associative 
meanings will be developed and a 
measure of this concept outlined. 
Third, an experimental investigation 
of the relationship between the meas- 
ure and judged relatedness will be 
reported. 


ASSOCIATIVE MEANING AND 
RELATEDNESS 


Associative Meaning 


The associative meaning of a word 
is defined as the ordered set of asso- 
ciates given by an individual 7 at time 
t in response to that word u. This 
definition is based upon two con- 
siderations. First, the associative 
meaning of a word for a particular 
individual may be thought of as all 
of the associates connected to that 
word by that individual. Each asso- 
ciate is, in a sense, a “partial meaning” 
(Noble, 1952). Therefore, each asso- 
ciate must be an integral part of the 
definition of meaning. Second, it has 
been suggested (Underwood & Shulz, 
1960) that the associates to a given 
word have a relatively stable arrange- 
ment which reflects their psycho- 
logical importance and that a be- 
havioral estimate of this arrangement 
is found in their order of emission. 
In other words, the order of the ob- 
tained set of associates must be part 
of the meaning of the word; the first 
emitted associates carry a greater 
share of meaning than later associates. 
The following is a formal statement 
of the definition of associative 
meaning. 

Let u denote a stimulus word. We 
define the associative meaning of u as 
the sequence A = (a, ds,-::, ay, 
***, Am) where a1 = u and a,:++, dm 
are associates given to the stimulus 
word by an individual at a particular 
time. 

It is assumed that the most im- 


portant component of the meaning of 
any word is the word itself. There- 
fore, a, = u for all u. 


Associative Relatedness 


The relatedness of the two words 
(u and v) is defined as a function of 
the degree to which their respective 
meanings (A and B) intersect or over- 
lap. The size of the intersect of two 
meanings is a function of both the 
number of common associates and the 
congruence of the rank orders of these 
elements. It should be noted that 
when all the elements of A and B 
are identical, but the orders of 
emission are different, relatedness is 
not complete. 

The measure of relatedness devel- 
oped below follows from the definition 
of associative meaning and is designed 
to account for the two factors which 
have been postulated to coinfluence 
degree of relatedness: the number and 
order of identical associative elements. 
It is an expression of the ratio of the 
obtained overlap to the maximum 
possible overlap. Each step in the 
derivation of the measure is presented 
below. Table 1 contains a numerical 
example of the calculations involved 
in determining the value of the 
measure. 

Let u and v be different stimulus 
words. For a given individual let the 
associative meaning of u be A = (a, 
@2,***,Qj,***, Am) where a, = u and 
@2,***, Am are associates to u. Define 
the associative meaning of v as 
B = (b1, bz,- +, bat- -bn) where bi = 7 
and ġbs::-,bn are associates to v. 
Letm < n. 

Weighting the associates. To each 
element a; in the jth position in A 
assign the rank (n — j + 1)? where n 
is the number of elements in B and $ 
is some fixed number > 0 (p = 1 in 
Table 1). Similarly, to b; in the 
ith position in B assign the rank 


(an =i +1), 
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This operation assigns weights ac- 
cording to the assumed psychological 
importance of the elements, the first 
emitted responses receiving the great- 
est weight. It also gives equal weight 
to the first elements in A and B even 
though the total number of elements 
in the sequences may differ. The pro- 
cedure of giving equal weight to the 
first associates is based upon the as- 
sumption that the importance of the 
first associate in a small hierarchy is 
at least as great as the importance of 
the first associate in a larger hierarchy. 

The weighting exponent. The psy- 
chological importance of the emitted 
associates is a function of their order 
of emission. The weighting exponent 
p determines the exact nature of this 
function. In general, the higher the 
value of p the greater the weight 
given to first emitted associates. The 
appropriateness of a particular $ 
value depends upon the slope, or the 
shape of the probability distribution, 
of the associative hierarchy. This 
slope is controlled by stimulus, in- 
dividual, and situational variables. 
Let us briefly consider each of these 
factors. 

Stimulus words that have dominant 
associates such as “‘table-chair” and 
“light-dark” may be thought of as 
having steeply sloped hierarchies since 
their first associates carry a greater 
proportion of the words’ total mean- 
ings. For these words a relatively 
high p should be used. Words having 
no dominant associates have flat 
sloped hierarchies since all of the 
associates carry relatively equal 
amounts of meaning. Low p values 
are appropriate for these words. The 
most obvious sources of information 
concerning hierarchies associated with 
a particular stimulus word are the 
current normative data obtained from 
group associations (Russell & Jenkins, 
1954). 

Individuals may vary in their ten- 
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dency to produce steep or flat sloped 
hierarchies. The hierarchies of some 
people may contain large numbers 
of approximately coequal associates 
while other individuals may possess 
steeply sloped hierarchies. Any per- 
sonality or individual variable which 
distinguishes individuals in terms of 
their tendency to produce steep or 
flat hierarchies should be useful in 
generating testable hypotheses con- 
cerning the nature of p. 

Any number of situational variables 
might bear upon the assignment of 
the weighting exponent. The work of 
Smith and Raygor (1956), in which 
the experimental procedure affected 
the commonality of the obtained 
responses, is one example of the pos- 
sible influence of situational factors. 

The appropriate value of p under 
any given circumstance will have to 
be determined empirically or in ac- 
cordance with some theory. 

For the sake of continuity in the 
development of the measure, further 
discussion of the determination of p 
will be delayed until we approach the 
problem on an empirical basis in 
Experiment 2. 

Assessment of the maximum pos- 
sible overlap. Let A = [n, (n — 1)?, 
--+,17] and B= [n (n — 1)?, 
see, 17]. A-B = nn? + (n ~ 1)? 
(n — 1)?-+-+ 171r. This product 
(A+B) represents the intersect which 
would be obtained if every element in 
A appeared in B (and B in A) and in 
the same order. [n? — (n — 1)?} is 
an expression of the difference between 
the product A -B and the A -B product 
in the case where a; and b, are not 
identical words.2 Therefore, A-B 


21f the pair an bı occurred the product 
A-B would be (n)(n?) + (n — 1)?(m — 1)? 


-2 
+ S ke», If every pair but the pair a1, bi 
k=1 


occurred the product A -+B would be n? (n—1)? 
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TABLE 1 


COMPUTATION OF A SAMPLE RC FOR THE 
StimuLus Worps EAGLE AND 


Bird ($ = 1) 

Agosto | Rank | Avpogiats> | Rank 
bird 5 eagle 5 
wing 4 fly 4 
fly 3 bird 3 
nest 2 nest 2 

claw 1 

A = (bird, wing, fly, nest) 

B = (eagle, fly, bird, nest, claw) 

A = (5, 4, 3, 2, 1) 

B = (5, 4, 3, 2, 1) 

C = (bird, fly, nest) 

A = (65, 3, 2) 

B = (3, 4, 2) 
RC (5, 3, 2)- (3, 4, 2) =a 


MP GPAIS OMI); i493) Det 


minus [n? — (n — 1)?} represents 
the maximum possible overlap in the 
case where a; and bı, the stimulus 
words, are different. 

Assessment of the obtained overlap. 
Let C = (c1, c2,-+-, c4) be obtained by 
deleting all entries in A which do not 
occur in B. Define A by replacing 
each element in C by its corresponding 
rank as an element in A. Similarly, 
define B by replacing each element in 
C by its corresponding rank as an 
element in B. The product of these 
two sequences (A-B) represents the 
obtained overlap. A-B = 0 if A and 
B have no common elements. The 
product will be a maximum if all the 
elements in A exist in B (and B in A) 
and in the same order. 

The relatedness coefficient (RC). 


A-B 
(A-B) — [n? — (n — 1)? 


RC is the ratio of the obtained over- 


RG = 


n—2 
+(n—1)?nr+ È kh», The difference between 
k=l 


these two expressions is nr[n?— (n—1)r] 
—(n—1)*[n? — (n—1)]=[n— (n—1) 
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lap to the maximum possible overlap. 
It can assume values from 0 to 1. 


EXPERIMENT | 


This experiment was designed as a 
test of the validity of the RC measure. 
It was hypothesized that RC values 
based upon individual associative 
hierarchies would correlate positively 
with individual judgments of related- 
ness. The subjects associated to each 
member of 24 pairs of nouns. RC was 
computed for each of these pairs and 
correlated with subjective judgments 
of the relatedness of the same pairs. 


Subjects 


The subjects were 20 University of Michi- 
gan undergraduates enrolled in psychology 
courses. Participation in the experiment was 
a course requirement. 


Materials 


Word pairs. The authors were interested 
in obtaining a series of pairs that spread over 
the “relatedness continuum” so that the 
effectiveness of RC could be tested against 
varying degrees of relatedness. Twenty-four 
pairs of nouns were chosen by the following 
technique. Eight highly related pairs were 
chosen from Mawson (1942) by pairing ran- 
domly chosen nouns with one of their first 
synonyms. A second less related category was 
chosen by randomly picking a noun from the 
Thesaurus, choosing one‘of its synonyms and 
then looking up one of this synonym’s syno- 
nyms and pairing it with the original noun. 
Eight pairs assumed to be minimally related 
were chosen and matched by a random 
procedure. 

The frequency of occurrence of the members 
of the pairs was controlled through the use 
of the Thorndike-Lorge Word Frequency 
Count (1944), Two pairs at each level of 
relatedness were composed on the following 
frequencies: high-high, low-low, low-high, and 
high-low. The high words were AA words in 
the Thorndike-Lorge count, low words were 
A or below. 

The judgment scales. Each subject was 
presented with a 24-page booklet. On each 
page, 1 of the 24 pairs of words was printed 
above a 5-inch line with the ends marked 0 
to 1. There were no divisions on the line. 
Each booklet contained all the pairs in a 
different order. 


The association materials. The association 
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TABLE 2 


Twenty-Four WORD PAIRS ARRANGED BY 
THESAURUS DETERMINED RELATEDNESS 
Groups 


C == 
High related Medium related | Low related 


House-home Boat-vessel ieee ee 
Ocean-sea Night-shadow Summer-church 
Yolk-egg Emblem-label Lip-bacon 
Midnight-dark Closet-storehouse] Ash-gin 
Bandit-thief Path-street Navy-glass 
Lizard-reptile Table-bread City-sugar 
Restaurant-food | King-governor Jail-napkin 


Morning-dawn Lunatic-fool Whisker-prize 


material consisted of a 48-page booklet. On 
each of the pages 1 of the 48 stimulus words 
was printed 50 times in order to minimize 
chaining, or the influence of an associate 
upon subsequent responses. A blank space 
followed each printing of the word. The 
pages in these booklets were arranged in four 
random orders. Each booklet contained all 
48 words, 


Procedure 


The 20 subjects, tested in a group, were 
given the association booklet and read the 
following instructions, “On each of the follow- 
ing pages you will find a stimulus word listed 
many times. After each word is a blank 
space. Your task is to associate to the 
stimulus word. Look at the stimulus word and 
write in the space provided the word it makes 
you think of. You are not expected to fill in 
all the blank spaces, but do the best you can.” 

The subjects were given 1 minute to as- 
sociate to each stimulus word. When time 
expired for a particular word the experimenter 
would say “stop,” then instruct the subjects 
to turn the page and begin associating again. 

After a 5-minute rest period the subjects 
were given the judgment booklets with the 
instructions printed on the front. The experi- 
menter read the following instructions. 

On each of the following pages you will 
find pairs of words. Your task is to judge 
how related these two words are and to 
indicate your judgment by placing an “X” 
on the line which you will find below each 
pair of words. For instance, if you felt 
that two words were fairly unrelated you 
would place an “X” somewhere on the “0” 
end of the line; if two words seemed to be 
very related you would place your 10 al 
somewhere on the “1” end of the line. Try 
and make the position of your “X” corre- 
spond to the degree to which you feel the 
two words are related. 

No time limit was imposed. The subjects 


were allowed to reconsider previously made 
judgments. 


Results 


The judgment scores. ‘The judgment 
score for a pair of words was the per- 
centage of the line delineated by the 
subject’s X to the nearest sixteenth 
of an inch. The projection of the 
intersect of the X on the line scale was 
taken to be the judgment point. 

The relatedness coefficient (RC). RC 
was computed for each subject's asso- 
ciations to each pair of words ac- 
cording to the method described in 
Table 1. The computational proce- 
dure for determining RC involved the 
calculation of the ratio of the sum of 
the cross-products of the ranks of the 
common elements (raised to some 
power) to the sum of the squares of 
the ranks (raised to some power) of 
the larger hierarchy. RC was com- 
puted for both p = 1 and p = 2. 

Comparison of the RC and judg- 
ment scores. Rank-order correlations 
were computed between each subject's 
RC and judgment scores for the 24 
word pairs. Table 3 contains the 
individual correlations for both values 
of p. All correlations reached the .01 
level of significance. 


TABLE 3 


RANK ORDER CORRELATIONS BETWEEN 
Supyect’s LINE JUDGMENTS AND RC 
Scores FOR Two VALUES OF p 
i 

Rho Rho 
Subject Subject 


Au rwne 
Rae aa ee 
on 
nr 
wo 
=. 
> 


7 | 79 | 80 || 17 | .67 | .65 
3 | 92 | .78 || 18 | .65 | .71 
9 | .7 | .74 

10 | 80 | .76 || 20 | .63 | -68 
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Comparison of the effects of the two p 
values. The number of correlations 
falling above and below the median 
correlation for both weighting expo- 
nents were equal, indicating that the 
two p values did not differentially 
affect the correlations. 

Mean RC values. RCs and line 
judgments were averaged over all 
20 subjects. The rank order corre- 
lation between these means was .94 
(p < .01). 

Comparison of RC and Cofer’s Mf. 
An attempt was made to compare 
Cofer’s Mf (1957) and RC. Using the 
first associate given by each subject 
in Experiment I, Mfs were computed 
for each word pair. Since 13 of the 24 
pairs obtained a Mf of 0 it was un- 
feasible to rank order correlate Mfs 
and line judgments for all 24 pairs. 
Instead a correlation was performed 
on the 11 pairs which did obtain an 
Mf greater than 0. This correlation 
was nonsignificant (R =—.33). A 
rank order correlation between RC 
scores and line judgments for these 
same 11 pairs was .97 (p < .01). 


Discussion 


The obtained significant correla- 
tions support the hypothesis that 
judged relatedness between two words 
varies with the size of the overlap of 
the associative hierarchies connected 
to those words. 

Although correlation does not de- 
termine causality, the authors infer 
that judged relatedness depends upon 
associative relatedness. If this con- 
tention is accurate, the obtained 
results clearly demonstrate the de- 
pendency of individual behavior upon 
individual associative structure. 

Although we have emphasized the 
individual use of RC, some evidence 
has been compiled concerning the 
application of RC to group data. 
First, the reported significant correla- 
tion between mean RCs and mean line 
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judgments suggests that RC can 
generate highly accurate group pre- 
dictions. Second, a clustering study 
has been undertaken using these mean 
RCs. Preliminary results suggest that 
a highly significant positive relation- 
ship exists between amount of group 
clustering in free recall and degree of 
RC. Finally, it has been found that 
the mean RCs from Experiment | 
correlate positively with mean line 


judgments obtained from another 
group of 30 subjects (R = .90, 
p < .01). 


The fact that Mf obtained a value 
of zero in 13 of the 24 word pairs 
suggests that it may not be sensitive 
to low degrees of relatedness which are 
ascertainable through the use of RC. 
Mf is based upon only one association 
per subject while the continuous 
association task employed in the RC 
technique allows a greater total num- 
ber of different associations to be 
evoked (Cofer, 1958). The chances 
of low probability associations com- 
mon to both stimulus words being 
evoked are thereby increased. 

The nonsignificant correlation be- 
tween Mf and line judgments suggests 
that, at least for high values of Mf 
and judged similarity, the measure 
was not an accurate predictor in the 
present study. 

However, the strength of these con- 
clusions should be tempered by the 
notion that 20 subjects may not be an 
adequate N with which to obtain 
accurate Mfs (Cofer employed an 
N of 356). 

The list of words employed in this 
study was limited in the sense that no 
opposites were included. Further 
study of the relationship between RC 
and judged similarity will expand 
upon the type of word pair employed. 

The lack of a difference between 
the relative efficacy of the two weight- 
ing exponents suggests that subjects 
and words employed in the experiment 
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were heterogeneous enough to mask 
the importance of either value. For 
instance, if p = 1 was the appropriate 
exponent for half the subjects and 
p = 2 appropriate for the other, then 
grouping the results of both halves 
would render the effects of the two 
values indistinguishable. 


EXPERIMENT II 


This experiment was designed to 
demonstrate the predictive power of 
the weighting exponent (p) which may 
have been masked in Experiment |. 
Through the manipulation of subject, 
word, or situational variables it should 
be possible to demonstrate that, in 
some instances, a p value of 2 yields 
better correlations than does a value 
of 1. Although relevant word and 
situational variables might as easily 
have been employed, a subject vari- 
able was investigated in the present 
study. 

According to the theory presented 
earlier, p = 2 might produce better 
approximations of the correct weight- 
ing for individuals whose first emitted 
associates carried much of the mean- 
ing of the stimulus word while p = 1 
would be appropriate in the case of 
individuals for whom the first emitted 
associates were approximately as im- 
portant as later associates. If the 
subjects could be differentiated in 
terms of their tendency to attribute 
more or less importance to the first 
emitted associates, it should be pos- 
sible to demonstrate the predictive 
power of p. 

Mednick (1962) has proposed a 
theory and measure of creativity 
which enables us to differentiate the 
subjects according to this attribute. 
His theory states that the highly 
creative individual possesses flat asso- 
ciative hierarchies while the low 
creative has steep hierarchies. Flat 
associative hierarchies are those in 
which the probability of evocation of a 
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large number of associates are rela- 
tively equal. Steep hierarchies are 
those in which a few associates are 
extremely probable while others are 
quite improbable. Mednick presents 
evidence for the conclusion that his 
test of creativity, the Remote Asso- 
ciates Test (Mednick, 1962), spreads 
individuals along this steep-flat hier- 
archy dimension. 

The hypothesis is that correlations 
between RC and judgment scores 
should be higher for high creatives 
when p= 1 and that correlations 
should be higher for low creatives 
when p = 2. This is based upon the 
notion that, in the case of an in- 
dividual with a flat associative hier- 
archy, a close approximation to the 
true weighting exponent is represented 
by p =1. p= 2, which gives more 
weight to the first emitted associates, 
is proposed as the appropriate value 
in the case of a steep hierarchy 
individual for whom the first emitted 
associates carry more of the meaning 
of the stimulus word. 


Procedure 


The subjects were the same 20 under- 
graduates employed in Experiment I. In 
addition to the associations gathered in 
Experiment I, Remote Associates Test scores 
were obtained for all the subjects. The 
Remote Associates Test (RAT) was ad- 
ministered several months prior to the 
Experiment I test session. 

The subjects were not aware of the con- 
nection between the two sessions. On the 
basis of their RAT scores the subjects were 
divided at the median into high and low 
creativity groups. The scores of the low 
creativity group ranged from 7 to 19 with a 
mean of 13.8. The high creativity group 
scores ranged from 20 to 28 with a mean of 
23.5. Possible scores on the RAT vary trom 
0 to 30. 


Results and Discussion 


The correlations between RC and 
judgment scores were available from 
Experiment I for both p values. A 
count was taken of the number of high 
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creatives whose correlation was higher 
under the » = 1 condition and the 
number of low creatives whose corre- 
lation was higher under the p = 2 con- 
dition. Eight high creatives showed 
higher rho’s when p = 1 while 2 high 
creatives showed higher rho’s when 
p = 2. Seven low creatives showed 
higher rho’s when p = 2 while 3 low 
creatives showed higher rho’s when 
p = 1. These frequencies were sub- 
jected to Fisher’s exact test. The 
obtained probability value of .036 
(one-tailed) indicates that the crea- 
tivity group correlations showed dif- 
ferential improvement under the two 
ranking conditions. These results 
suggest that 2 is the appropriate p 
value for low creatives while 1 is more 
suitable for highly creative subjects. 
The results of this experiment imply 
that the weighting exponent is both 
meaningful and subject to empirical 
investigation. With proper considera- 
tion of individual, stimulus word, 
and situational variables appropriate 
weightings should be attainable. 
There are two approaches to the 
determination and use of the weighting 
exponent p which should be made 
clear. First, may be useful in test- 
ing theoretical notions concerning the 
shape of associative probability dis- 
tributions. This was the approach 
employed in the experiment reported 
above. The general procedure for 
testing theoretical propositions in- 
volves two steps. First, p values 
suggested by some theoretical con- 
sideration are used to determine 
values of RC. Second, these RC 
values are correlated with some set of 
dependent variables which are ac- 
ceptable estimates of relatedness (gen- 
eralization, ease of learning, transfer, 
judged or related similarity, etc.). If 
we find that the RCs based upon p 
values generated from our theory have 
high predictive power, we feel our 
theoretical position is supported. In 
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the reported experiment the hy- 
pothesis that highly creative indivi- 
duals have flat associative hierarchies 
and that low creative subjects have 
steep hierarchies was supported. 

The second approach to the de- 
termination of p is empirical and 
independent of theoretical considera- 
tions. In this technique associative 
probability distributions are obtained 
under some given set of circumstances. 
These distributions are then “‘fit” with 
appropriate p values. For instance, 
for a given group of subjects in a 
particular situation the word “table” 
may elicit a very limited steep hier- 
archy. Values of p would then be 
applied to these data until the most 
appropriate one was discovered, i.e., 
the one that most closely approxi- 
mated the shape of the obtained dis- 
tribution. In the case of the word 
“table” p would probably be some 
relatively high positive number. 
Values determined in this manner 
would be assumed to have some 
general applicability. Before we can 
fit p values to associative probability 
distributions we must obtain prob- 
ability distributions. If we are con- 
cerned with word variables and are 
holding individual and situational 
factors constant, it is relatively easy 
to obtain associative probability dis- 
tributions. For instance, the word 
association norms developed by Rus- 
sell and Jenkins (1954) are convenient 
estimates of the associative prob- 
ability distributions of 100 common 
words. Similarly, when we are con- 
sidering situational variables, prob- 
ability distributions may be obtained 
from group associations to different 
words. But when our interest is in 
individual variables we cannot depend 
upon group data; we must obtain 
probability distributions from each 
individual. In this case measures 
such as latency, resistance to for- 
getting, and frequencies based upor 
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repeated associations to the same 
words should be applicable. 

One final note about the range of 
values p can assume. Although we 
have employed p values of only 1 and 
2 the theory does not restrict p to 
this range. Theoretically p can as- 
sume any value. For instance, it is 
interesting to speculate about the 
value of p in the case where the 
stimulus word is highly charged with 
affect. We might suspect that the 
order of emission would be inversely 
related to psychological importance. 
If this was so, the appropriate p value 
would be negative. 

Even though it is possible to define 
empirically p values which “fit” 
particular circumstances, it should be 
remembered that any exponential 
function is merely an approximation 
of a function which is probably very 
complex. 


EXPERIMENT III 


If the hierarchies evoked in one 
experimental situation were mani- 
festly different from the hierarchies 
obtained in another session, the use- 
fulness of RC would be in question. 
In order to test the stability of RC the 
20 subjects used in Experiment I were 
asked to participate in a second ex- 
perimental task. In this session the 
subjects associated to the same stimu- 
lus words under conditions compar- 
able to the first session. It was hy- 
pothesized that the RC scores obtained 
on the two different occasions would 
correlate positively. 

A second hypothesis was based 
upon Mednick’s 1962 finding that 
high RAT scorers have a greater 
repetoire of available associates than 
do low scorers. These results suggest 
that high RAT scorers will produce 
hierarchies which vary more from 
situation to situation than do the 
hierarchies of low RAT subjects. It 
was hypothesized that the stability of 
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TABLE 4 
RANK ORDER CORRELATIONS BETWEEN 
RC VALUES OBTAINED ON DIFFERENT 
NiGHTS FROM Creativity GROUPS 


High creativity Low creativity 


Subject Rho | Subject Rho 
4 85 2 91 
3 83 14 89 

11 .82 10 .89 
15 -80 1 88 
7 80 13 .87 
5 80 17 84 
9 79 12 84 
16 a7 6 84 
18 72 20 81 
8 72 19 78 


the high RAT scorers would be 
significantly less than the stability of 
the low scorers. 


Procedure 

The subjects and the association booklets 
were identical to those employed during 
Experiment I. On the night following Ex- 
periment I subjects again associated to the 
same “48 stimulus words. The same in- 
structions and procedures were followed with 
the addition of the following statement read 
to the subjects. “Try to associate to these 
words as if you had not already done so.” 


Results and Discussion 


For each subject the RC scores ob- 
tained during the first session were 
correlated (rank order) with the RC 
values obtained during the second 
session (p =1). Table 4 contains 
these correlations. 

Each of the correlations was signifi- 
cant at greater than the .01 level. 
The subjects were split into the same 
high and low creativity groups that 
were employed in Experiment Bh 
A count was taken of the number of 
high creative and low creative subjects 
falling above and below the median 
stability correlation. Eight high crea- 
tive subjects were below and two were 
above the median. This is in contrast 
to the low creative group, eight of 
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whom were above and two below the 
median stability correlation. Fisher’s 
exact test, when applied to these fre- 
quencies, yielded a probability of .014 
(one-tailed). 

The high positive correlations ob- 
tained between the RCs support the 
hypothesis that RC is a reliable 
measure. These correlations indicate 
that the evoked hierarchies are stable 
enough to justify the use of the first 
night’s samples as representative as- 
sociative meanings. 

Future use of this measure should 
entail modifications of procedure so 
that more representative samples of 
individual hierarchies might be ob- 
tained. This may be accomplished by 
repeated association sessions from 
which average ranks can be computed. 
Further, although the instructions 
and the form of the associating 
material were designed to discourage 
chaining, it still occurred. Obviously, 
such chain associating can only lower 
correlations as it increases the size of 
the denominator of RC while it lessens 
the chance of overlapping associates. 
Repeated visual presentation or ver- 
bal repetition of stimuli by the 
experimenter would further reduce 
chain associating. 

The significant difference between 
the stability of the RCs of the high and 
low RAT scorers again demonstrates 
the importance of individual variables 
in the consideration of associative 
relatedness. 


REFERENCES 


BousrieLp, W. A., Wuitmarsu, G. A, & 
DENICK, J. J. Partial response identities 
in verbal generalization. Psychol. Rep., 
1958, 4, 703-713. 

Corer, C. N. Associative commonality and 
rated similarity of certain words from 
Haagen’s list. Psychol. Rep., 1957, 3, 
603-606. 

Corer, C. N. Comparison of word associa- 
tions obtained by the methods of discrete 
single word and continued association. 
Psychol. Rep., 1958, 4, 507-510. 

DEESE, J. On the structure of associative 
meaning. Psychol. Rev., 1962, 69, 161-175. 


Bertram E. Garsxor anp Joun P. Houston 


FLAVELL, J. H. Meaning and meaning 
similarity: I. A theoretical reassessment. 
J. gen. Psychol., 1961, 64, 307-319. (a) 

FLAVELL, J. H. Meaning and meaning 
similarity: II. The semantic differential 
and co-occurrence as predicters of judged 
similarity in meaning. J. gen. Psychol., 
1961, 64, 321-335. (b) 

FLAVELL, J. H., & JoHNson, ANN B. Mean- 
ing and meaning similarity: III. Latency 
and number of similarities as predicters of 
judged similarity in meaning. J. gen. 
Psychol., 1961, 64, 337-348. i 

HaaGen, C. H. Synonymity, vividness, 
familiarity, and association value ratings 
of 400 pairs of common adjectives. J. Psy- 
chol., 1949, 27, 453-463. 

Jenkins, P. M., & Corer, C. N. An ex- 
ploratory study of discrete free association 
to compound verbal stimuli. Psychol. 
Rep., 1957, 3, 599-602. 

Mawson, C. O. Roget's thesaurus of the 
English language in dictionary form. New 
York: New Home Library, 1942. 

MEpNICcK, S. A. The associative basis of the 
creative process. Psychol. Rev., 1962, 69, 
220-232. 

Noster, G. E. An analysis of meaning. 
Psychol. Rev., 1952, 59, 421-430. 

Oscoop, C. E., Suci, G. J., & TANNENBAUM, 
P. H. The measurement of meaning. 
Urbana: Univer. Illinois Press, 1957. 

Russet, W. A, & JENKINS, J. J. The 
complete Minnesota norms- for responses 
to 100 words from the Kent-Rosanoff word 
association test. Tech. Rep. No. 11, 1954, 
University Minnesota, Contract N8 onr- 
66216, office of Naval Research. 

Smımu, D. E. P., & Raycor, A. L. Verbal 
satiation and personality. J: abnorm. soc. 
Psychol., 1956, 52, 323-325. , 

THORNDIKE, E. L., & Lorce, I. A teachers 
word book of 30,000 words. New York: 
Teachers College, Columbia University, 
Bureau of Publication, 1944. 

UNDERWOOD, B. J. An orientation to research 
on thinking. Psychol. Rev., 1952, 59 
209-220. 

UNDERWOOD, B. J., & RICHARDSON, J. Some 

‘ verbal materials for the study of concept 

© formation. Psychol. Bull., 1956, 53, 84-95. 

UNDERWoop, B. J., & SHuLz, R. W. Mean- 
ingfulness and verbal learning. New York: 
Lippincott, 1960. 

Wuttmarsn, G. A, & BousrieLD, W. A 
Use of free associational norms for the 
prediction of generalization of salivary 
conditioning to verbal stimuli. Psychol. 
Rep., 1961, 8, 91-95 


(Received February 9, 1962) 


VoL. 70, No. 4 Jury 1963 


PSYCHOLOGICAL REVIEW 


INVESTED SELF-EXPRESSION: 


A PRINCIPLE OF HUMAN MOTIVATION 
RAYMOND J. McCALL 


Marquette University 


A taxonomy and rationale of basic human motives on phenomenologi- 
cal and logical grounds. “Basic” means principally unlearned though 
environmentally released. Motive is defined as felt tendency re- 
specting the cognized desirable-undesirable, the cognitive element dis- 
tinguishing it from need. 4 biological motives called “categorical” 
are acknowledged plus 3 or 4 others designated “pre-emptive.” A 
number of co-primary but nonbiological “anastatic’ motives are 
postulated and one basic social motive, affiliation. Behaviorism and 
Freudianism are criticized for failure to recognize the basic nature of 
anastatic and affiliative motivation and Harlow’s research is cited 
as corroborative. A correlative theory of anxiety and hostility is also 
set forth and the principle of invested self-expression suggested as 


linking self-actualization and ego involvement or propriateness. 


The psychologist of today may flatter 
himself on having transcended Wundt’s 
expectation that psychology could only 
progress in the Baconian inductive 
manner by the introspective dismem- 
berment of states of consciousness. 
He may also consider himself well 
rid of Fechner’s philosophical pre- 
tensions of establishing a science of 
mind by the precise measurement of 
sensory stimuli coordinated with ex- 
perienced changes in consciousness. 
Yet there is in both Wundt and Fech- 
ner a laudable recognition of unique- 
ness in the scientific psychological en- 
terprise, which our contemporaries may 
overlook in their self-congratulatory 
concern with behavioral rather than 
introspective data. 

We may thus recognize, besides the 
stimulus error which Titchener damned, 
the response error which the behavior- 


ists canonized by regarding behavior 
as the terminus ad quem as well as the 
terminus a quo of psychological study. 
If we attend carefully to what they 
deal with in their writings rather than 
to their definitions, we can say, nev- 
ertheless, that the vast majority of to- 
day’s psychologists are as much con- 
cerned with the inner world of the 
psyche as were Wundt and Fechner. 
They may believe it can be reached by 
a different route from the introspective 
one, but only a few doctrinaire be- 
haviorists seem to be interested in be- 
havior for its own sake. Though most 
of us study behavior, what we are in- 
terested to arrive at is the lineaments 
of the inner world of tendential vari- 
ables which govern behavior. It is 
not even behavior as such which is our 


„primary datum but rather the regu- 


larity or consistency of behavior, and 


289 


290 Raymonp J. McCay 


it is this which leads us to postulate an 
inner world of capacities and disposi- 
tions—these are what is meant by 
“tendential variables” above—which 
will account for this perceived regu- 
larity. The psychologist studies be- 
havior as the physicist studies displace- 
ment or mechanical transformation, in 
order to arrive at “structures” and 
“forces” or other explanatory concepts 
which will enable him to understand, 
predict, and control the kind of data 
with which his investigation began. 
The psychologist, like every other sci- 
entist, in short, is interested not in facts 
per se, but in explanations of facts, 
specifically as a psychological scientist 
in capacities and dispositions like “per- 
ception,” “learning,” “intelligence,” 
“mechanical aptitude,” “motor skill,” 
“habit,” “set,” “motive,” “interest,” 
“value,” “trait,” which can render in- 
telligible the inchoate mass of be- 
havioral data. 

What distinguishes the scientific 
from the philosophical or literary or 
common sense psychologist is not ab- 
sence of concern with these “mental- 
istic” intervening or tendential vari- 
ables but the determination to deal 
with them within the confines of the 
“empiriological” method, or only inso- 
far as they can be specified or denoted 
by way of controlled observation and 
measurement, not directly, of course, 
but inferentially. 

In the brief space available this pa- 
per will concentrate on one area of this 
inner world of tendential variables, the 
area of motivation, of what is some- 
times called the “springs of action.” 
It will suggest a taxonomy of basic mo- 
tives as a qualitative and at best par- 
tially validated foundation for what 
may eventually be a theory of human 
motivation,* 


1The perceptive reader may see in this 
taxonomy an attempt to do again what has 
been variously accomplished earlier by 


Tue NATURE oF HumMAN Motives 


It is tempting at the outset to de- 
scribe these basic human motives as 
unlearned, which would mean either 
present at birth or appearing at some 
predetermined point in the life span 
as a result of intrinsic or endogenic 
maturational processes taking place in 
a merely sustaining or supportive en- 
vironment. Yet there is much to be 
said for Solomon Asch’s observation 
that the learned-unlearned dichotomy 
is largely inapposite to human moti- 
vation. Hunger, for example, must 


certainly be classed as a basic motive — 


in man, but as Asch (1952) says: 


No needs—not even the earliest and 
simplest—refer at the outset to, or con- 
tain a representation of, the objects that 
might quiet them. The infant is uncom- 
fortable and restless when hungry or thirsty, 
but this condition does not yet contain a ref- 
erence to food or drink. Needs are at first 
objectless or goalless. To bring the con- 
dition of hunger into relation with the ob- 
jects that satisfy it a specific form of ex- 
perience is necessary. The organism must 
encounter the object and feel its pleasurable 
or painful effects. Such an encounter is the 
necessary condition for establishing the prop- 
erties of the object as a goal or its relevance 
to the need. When the relevance has been 
experienced and has altered the organism by 
establishing a trace of itself we observe the 
transition from a condition of need to a 
state of motivation [p. 83]. 


Yet even, Asch continues, attitudes of 
the organism like gratitude and humor 
which could not exist apart from a 
great deal of learning 


Henry Murray, Abraham Maslow, Robert 
Woodworth, and by their common ancestor, 
William McDougall. My debt to all of 
these is great. Less evident but no less real 
is my indebtedness to Edward Tolmans 
(1942) taxonomy in his Drives toward War 
and the scholarly review of theories of mo- 
tivation found in Gardiner, Metcalf, and 
Beebe-Center (1937) Feeling and Emotion. 
If the positions taken in this paper are quite 
opposed to many of these prior formulations, 
this fact may justify their statement with- 
out minifying their dependence. 
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cannot, in the strict sense of the word, be 
taught. The individual must generate the 
reaction himself; he must respond to given 
conditions in specific ways that do not fall 
under the rubrics of innate or learned. 

With regard to the most elementary 
motives like hunger and thirst, how- 
ever, it seems that we are dealing here 
with a human phenomenon somewhat 
analogous to the phenomenon of im- 
printing in the lower animals, or the 
releasing of a well-organized behavior 
sequence by a single or very limited 
experience rather than with an associ- 
ative learning following the law of ef- 
fect as Asch seems to think (cf. Hess, 
1959). If in the previous quotation 
we omit the reference to “pleasurable 
or painful effects,” therefore, we can 
recognize a great deal of merit in 
Asch’s suggestions that specific experi- 
ence plays a role in the most “innate” 
motives and that endogenic factors de- 
termine in great measure the most ob- 
viously “learned” motives. 

With this reservation in mind we 
may characterize the basic motives as 
principally unlearned and as having an 
organized and predictable quality sug- 
gestive of innateness, granted the ap- 
propriate “releasers” in addition to a 
sustaining environment.? 

In accordance with widespread prac- 
tice, we may assume the existence 
within the human organism of a gen- 
eral élan or upward push as lying be- 
hind specific motives. Perhaps this 
postulated generic force may be de- 
scribed as the principle of self-actuali- 


2It may be that future empirical investi- 
gation will establish a greater role for par- 
ticular experiences in the specification and 
structuring of these motives, but so far the 
necessary experimental evidence appears to 
be lacking. In any event, if we grant that 
both learned and unlearned determinants 
apply to all human motives, we must hold 
that the amount of weight to be accorded to 
each determinant in regard to a particu- 
lar motive can be settled only by empiri- 
cal (preferably experimental) study. 
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zation, but it does not seem necessary 
at the outset to endow this principle 
with very much philosophical richness 
and precision. Rather, we may regard 
it simply as a scientifically relevant but 
intrinsically nonscientific postulate 
which covers the probabilistically gen- 
eral—but not absolutely universal— 
human disposition toward self-mainte- 
nance and toward a fuller existence 
(self-enhancement and self-expres- 
sion). We shall return to this postu- 
late subsequently. 

In defining motives, or the specific 
manifestations of this upward push, 
it seems most important to distinguish 
even the most elementary biological 
motives carefully from reflexes and 
from needs. A motive is thus not just 
a readiness or disposition to organized 
—or even purposive or adaptive—ac- 
tivity. A reflex anterior to its elicita- 
tion is that. A human motive is rather 
a felt tendency toward or away from an 
object cognized as in some sense de- 
sirable or undesirable, good or bad 
for us as we see it. 


Drives 


The first example of primary mo- 
tives which occurs to anyone are those 
elementary urges based upon physio- 
logical disturbance or lack and which 
psychologists have accustomed us to 
call “drives.” It is significant, how- 
ever, that most biological needs do not 
reach the status of motive or drive 
simply because they are not felt as 
such. Thus no one is inclined to pos- 
tulate a regular and predictable insulin 
seeking or calcium seeking drive in 
man, though insulin and calcium are 
absolute requirements for continued 
life. Even the fundamental need for 
air or oxygen does not seem to attain 
the quality of a basic drive or biologi- 
cally based motive, since it is expressed 
either in a reflex and unconscious or 
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automatic tendency, or else it is a prin- 
cipally learned motive. 

When we define a motive as a felt 
tendency, we do not imply that the 
cognitive element is necessarily specific 
and clear, and certainly not that the 
motive is necessarily conscious in the 
sense that we understand its rationale 
or basis. “Felt” means only that some 
cognizance of a desirable or undesirable 
object is minimally present. 

The “object” of a motive, it should 
be added, tends to be an activity or 
condition rather than a thing in itself. 
Generally, what we are motivated to- 
ward is activity about a certain situa- 
tion, thing, or person; e.g., eating food, 
emptying the bladder, going for a walk, 
courting a girl, beating an opponent at 
tennis or gin rummy, solving a prob- 
lem or a puzzle, convincing another 
person of the reasonableness of our 
point of view. Sometimes, indeed, we 
may be doing no more than putting 
ourselves in a relaxing or largely pas- 
sive situation like lying down to rest, 
sunning ourselves, or watching a play 
on television, and in these cases the ac- 
tivity is preparatory and the receptive 
condition about the object (rather than 
the action) seems primary. 

It is a mistake, as Henry Murray has 
so clearly seen (Kluckhohn & Murray, 
1956), to identify the goal or end of 
the motive with the subjective state of 
satisfaction which follows from the ob- 
jective activity or condition of posses- 
sing or avoiding the object, for this 
leads to a kind of orectic solipsism in 
which the subjective state of desire 
has its term in the subjective condi- 
tion of satisfaction and which quite fails 
to describe the objectively oriented di- 
rection of most human motivation. It 
seems also to lead to a logical regress 
in which what is sought prospectively 
is the satisfaction which comes only 
after what is sought is had. Quiescent 
contentment may often be the chrono- 
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logical terminus of the motivational se- 
quence, but it is not regularly the psy- 
chological end or goal of the motive, 
though the partisans of hedonism from 
Aristippus of Cyrene to Sigmund Freud 
seem never to have appreciated this 
distinction. 

With regard to the physiological mo- 
tives or drives themselves, we may, 
with Cannon, designate these as “ho- 
meostatic,” since in most cases what 
may be called a disturbance of biologi- 
cal equilibrium or homeostasis is in- 
volved. We could then say that when 
this disturbance reaches the psycho- 
logical level of felt tension, it expresses 
itself as a motive or tendency toward 
activity or receptivity about a certain 
object. 


Categorical Drives 


Of the homeostatic motives or drives, 
only four can be classified as “cate- 
gorical,” i.e., as unconditionally neces- 
sary for the continued existence of the 
organism, viz., hunger, thirst, the ex- 
cretory, and rest-or-sleep drives. We 
may note that rest-sleep and excre- 
tion are initially realized by simple 
reflexes and take on the quality of mo- 
tives only with maturation. Under ex- 
treme need, also, the rest-or-sleep and 
excretory drives return to their status 
of being reflexly determined and thereby 
are removed from the realm of motiva- 
tion. In the human, of course, hunger 
and thirst are not initially separable 
and in the condition of neonatal help- 
lessness are less “coping” and more 
dependent on receptive reflexes than 
they will be after a certain maturation. 


Pre-emptive Drives 


Besides the four homeostatic mo- 
tives which are categorical, there are 
three or perhaps four other drives 
which, while not strictly necessary tO 
the maintenance of individual existence, 
can at times assume greater importance 
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for the organism than the categorical 
drives themselves. We may call such 
drives “pre-emptive.” 

Of these first mention should be 
made of the negative drive of pain 
avoidance. Its fundamental nature is 
to some extent revealed by the con- 
sideration that even categorical drives 
like hunger and thirst operate in part 
through it. To many (for instance, 
Tolman, 1942) it seems that it is the 
pangs of hunger and the discomfort 
of thirst that make the complementary 
food seeking or drink seeking activi- 
ties so compulsory and commanding. 
One can, to be sure, overweigh the 
deficit or pain-avoidance aspect of the 
categorical drives, but there is no rea- 
sonable doubt of their connection with 
pain in cases of marked deprivation. 

In any event, few would disagree 
that pain is often a signal of non-well- 
being and as such of great biological 
significance, or that psychologically 
few motives are as strong and effective 
as the avoidance of intense pain. All 
the beauties and values of existence 
fade for the man with an earache or 
an abscessed tooth, though we know 
a person may continue to exist for 
years in intense pain. We say “ex- 
ist” rather than “live,” for if the gen- 
erally successful avoidance of pain is 
not necessary to life, it is necessary to 
make life worth living for most of us, 
however much the mature person may 
recognize the inevitability of pain and 
even its positive value for emotional 
growth. As a biological motive, pain 
avoidance almost perfectly typifies the 
quality designated by the term pre- 
emptive. We may use this latter term, 
then, to designate biological motives 
which tend to command the organism’s 
field of attention and to take precedence 
over or displace other motives which 
may be more fundamental or necessary 
to the organism’s self-maintenance. 

In order not to disappoint those who 
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place the psychologist only after the 
psychiatrist and the traveling salesman 
in the disposition to overvalue the sex 
motive, we should probably concede 
that for most persons, and perhaps 
this is more nearly true in the round 
for males than females, the developed 
sex motive has a pervasive and insist- 
ent quality that would warrant classing 
it among the pre-emptive drives. 
While learning undoubtedly plays a 
great part in its development, there is 
still a wealth of evidence that the sex 
motive as developed by intrinsic matu- 
ration and the appropriate environ- 
mental releasers is too intense and in- 
trusive to be regarded in most males 
as anything less than pre-emptive. 

As Freud (1938) rightly noted, sex 
is originally a completely self-directed 
or narcissistic responsiveness. Though 
Freud was also profoundly right in 
pointing to the erogeneity of the oral 
and anal zones, he was wrong, it would 
seem, in not recognizing that the origi- 
nal and proper sex motive is based 
entirely on the sensitivity of the indi- 
vidual’s genitals. There is nothing to 
suggest that in the infant sex has any 
original reference to another person or 
in the strict sense to any region of the 
body other than the genitals, though 
with maturation it will become linked 
with the oral and anal and other eroge- 
nous zones, and with maturation also 
it will come to be evoked primarily in 
relation to another person, normally 
of the opposite sex. 

The motive toward intimate physi- 
cal contact with another human being 
—a special kind of affiliative motive, 
which until Harlow’s studies of “con- 
tact comfort” was so much neglected by 
psychologists that even the name for 
it is scarcely known—we call “con- 
trectation.” Though basically tactual, 
contrectation is originally quite distinct 
from the basic sex drive and has no 
necessary relation to the genital organs 
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(Dalbiez, 1938). In the normal course 
of development, however, the individual 
sex drive and the contrectational mo- 
tive become fused to form the core of 
the mature heterosexual motive. Non- 
sexual contrectation is manifested be- 
haviorally by clinging to, patting or 
stroking, embracing, and the like. Its 
capacity to release specific sexual mo- 
tivation sometimes comes as a shock 
to the early or preadolescent, but this 
capacity can certainly not be regarded 
as fortuitous. 

The third of the pre-emptive motives 
is the activity drive. Its homeostatic 
basis seems to be in an energy surplus 
of the well-fed and rested organism 
and its reflex or automatic Anlage in 
the myoneural reflexes and spontane- 
ous discharges of skeletal musculature. 
These spontaneous discharges are ap- 
parently neither goal oriented nor felt 
originally, so we should not speak of 
an activity drive or activity motive until 
a certain degree of maturation has oc- 
curred. On the other hand, when 
muscular activity is employed for some 
specific goal rather than for the simple 
release of muscular tension, we should 
describe the motive in relation to this 
goal rather than by reference to the 
activity drive alone. The pre-emptive 
quality of the activity motive is best 
demonstrated when we limit the organ- 
ism’s free movement. Though neither 
painful nor preventive of specific goal 
attainment, such restraint is generally 
intolerable to the organism. When the 
generalized need for discharge of mus- 
cular tension is reduced by external 
goal-oriented behavior, as it is perhaps 
ordinarily, the activity drive has be- 
come absorbed into some other motive 
such as mastery or play and is only 
implicitly operative. 

In some respects the maternal motive 
should be classified among the pre- 
emptive drives, but the actual motive 
is so complex and variable in the hu- 
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man and so overlaid with learning that 
the simple maternal succorance drive 
(with its lactogenic and other pituitary 
hormonal bases) seems in itself to 
constitute only a tiny—and it may well 
be dispensable—nucleus of human 
mothering behavior. Though David 
Levy (1942) has presented some evi- 
dence of connection between hormonal 
femininity and social maternity, his 
findings have never been independently 
confirmed, and there is no gainsaying 
the fact that the maternal motive in 
the human female may be overpower- 
ingly strong post partum or to all in- 
tents and purposes nonexistent. It 
would seem, therefore, appropriate to 
designate the drive itself as incomplete, 
in that by itself it does not account for 
a great deal of maternal behavior and 
since sociopsychological factors may 
far outweigh or nullify its effects. In 
some instances, perhaps, we can say 
as much for the sex drive. 

In summary, we have mentioned 
hunger, thirst, the excretory, and sleep- 
or-rest drives; pain avoidance, sex, aC- 
tivity, and the maternal drive. These 
are, if our thesis is correct, the only 
strictly biological motives or drives. 


Repuctionism IN MOTIVATION 


Because the biological drives are $0 
palpable and powerful and because mo- 
tives like curiosity and mastery appear 
ordinarily chronologically after such 
motives as hunger and thirst, it is a 
great temptation to suppose that one 
might “explain” such motives as curi- 
osity and mastery by some process % 
associative connection with or con- 
ditioning of the biological drives. And 
many have fallen prey to that tempta- 
tion: not only Watson and Freud, but 
Clark Hull and Neal Miller, John 
Dollard and Hobart Mowrer, Margaret 
Mead and Geoffrey Gorer, Ernest Hil- 
gard and Laurance Shaffer, and many 
another experimentalist, anthropologist, 
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and clinician. Most have concluded 
that man’s only original motivational 
endowment is the biological drives, 
while behaviorists and those influenced 
by them have added a few closely re- 
lated states like fear, anger, and dis- 
like arising chiefly from the frustra- 
tion of his efforts to eat, drink, sleep, 
excrete, to be active muscularly and 
sexually, and to avoid pain. These, 
plus the laws of associative learning or 
conditioning as the behaviorist sees it, 
are all that man has to begin with 
(Watson, 1930), but this motivational 
vacuum will be filled by a plethora of 
“secondary drives” or “learned mo- 
tives” arising from the repeated ful- 
fillment or frustration of these origi- 
nal motives that is involved in social 
conditioning (Dollard & Miller, 1950). 

The Freudians, especially the ortho- 
dox Freudians but to some extent the 
neo-Freudians as well, have tended to 
admit only two motives: a general 
erogeneity or libido or sexuality (Eros) 
and a general destructive aggressive- 
ness (Thanatos), both of which are 
felt as unpleasant tensions seeking dis- 
charge. These motives may find re- 
lease in many ways, direct and indi- 
rect; not only in genital, anal, oral, 
urethral, and similar activity but in an 
infinitude of symbolic substitutes : neu- 
rotic symptoms, artistic endeavor, re- 
ligious ritual, humanitarianism, and 
even psychological investigation. But 
in all these substitute activities, in all 
displacements, topographical or sym- 
bolic, the generic aim remains un- 
changed: the release of a sexual-like 
tension. (Hall, 1954) There are no 
psychologically significant motives ex- 
cept sex and aggression. There are 
just many ways in which these mo- 
tives can be reduced, and it is these 
identical motives which are being ful- 
filled by smearing excrement or paint- 
ing the Sistine Chapel, by hysterical 
conversion, sexual exhibition, or the 
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most artful rendition of Hamlet’s so- 
liloquy or a Bach prelude, by scientific 
study or by window peeping, by the 
humanitarian dedication of Schweitzer 
or the mass murders of Hitler. 

Thus a motive like curiosity or mas- 
tery or affection must be seen as ef- 
fective only because of an associative 
pairing with the satisfaction of a bio- 
logical drive like hunger or sex, or 
because its goal is both symbolically 
and ontologically equivalent to an ac- 
tivity which reduces sexual or similar 
tension. 

These views of human motivation, 
it may be added, seem to reflect more 
the extrapolation of theories arising 
from studies of albino rats and the re- 
gressive fantasies of near-psychotic 
adults, than the careful observations of 
normal children or even of higher mam- 
mals such as the macaque. 


“ANASTATIC” MOTIVES 


What Woodworth (1918) has been 
maintaining for almost 50 years and 
Allport for 30 has been given new di- 
mensions by the comparative and phys- 
iological psychologists, especially Har- 
low. And it is this precisely that the 
two schools—behaviorism and psycho- 
analysis—which have dominated psy- 
chological thinking for the past genera- 
tion or more have most signally failed 
to appreciate: viz., the co-primacy of 
a group of motives distinct from and 
irreducible to the biological drives. 
The term “anastatic” has been coined 
to describe these motives, since they 
are concerned not with the mere main- 
tenance of life or the organism’s re- 
turn to a previous tensionless state 
(homeostasis) or with the removal of 
unpleasant deficits; but rather with the 
enhancement of the individual life, its 
passage to a new and richer condi- 
tion (anastasis) beyond the mere re- 
duction of segmental tensions. It 
would seem that anastatic is a more 
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precise descriptive term than Wood- 
worth’s (1958) “dealing with the en- 
vironment” or White’s (1959) “com- 
petence,” conceptually close to each 
other as the underlying notions are. 

Certain of these anastatic or self- 
expressive or abundancy motives are 
cognitive in nature, having to do with 
the exercise of sensory and perceptual 
and related powers; others are motoric, 
involving the exercise of manipulative 
and other motor abilities. All are, with 
one possible exception, quite clearly 
discernible in the behavior of normal 
human infants in the first year of life, 
and exhibit no dependence or evident 
connection with hunger or sex. 

On the cognitive side, primary men- 
tion must be given to curiosity, or the 
motive to investigate, at first tactually 
and visually, every new object. The 
normal baby is alert to any new sen- 
sory stimulus. He will follow it with 
his eyes, turn his head to trace its 
movements as soon as motor develop- 
ment permits, and grasp at whatever is 
brought within his reach for the sake 
of conveying it to his principal tactile 
organ, his mouth; and it is doubtful 
that this latter has any necessary con- 
nection with the hunger drive, since 
the infant still endeavors to “incorpo- 
rate things orally” or at least to pal- 
pate them buccolabially when his ap- 
petite for food is quite satisfied. 

Responsiveness to sensory stimuli is 
by no means indiscriminate, however. 
From early infancy the human exhibits 
another cognitive motive which we may 
call sensory preference: e.g., for sweet 
tasting over bitter substances, for 
bright over achromatic colors, for 
smooth tones of moderate frequency 
over uneven and grating noises, and 
most importantly, as already indicated, 
for soft, fur-like surfaces over rough 
tactual stimulators. 

On the motor side the outstanding 
motive exhibited in infant behavior is 


an extension of cognitive curiosity. 
We may call this motive “mastery” 
and describe it as the motive to control 
the physical environment: to overcome 
obstacles, manipulate objects, produce 
various visible and audible effects on 
things around us—the more visible and 
audible the better—simply, so far as 
we can see, as a means of self-assertion. 
No one who has a normal 2-year-old 
around his house could seriously doubt 
the primacy and vigor of this mastery 
motive. From the moment that the 
toddler hurls open the door of his 
bedroom in the morning until he is 
dragged back to his bed at night, pro- 
testing vigorously, does there seem to 
be any surcease from his determina- 
tion to assert himself in every way pos- 
sible. Such activities as turning on 
the taps in the bathroom sink full blast, 
climbing shelves like steps, stuffing a 
cat into a bureau drawer, hitching the 
dog to a tricycle, turning up the vol- 
ume of the television set just below the 
threshold for audiogenic seizure, pound- 
ing on furniture with a mallet, turning 
lights on and off continually, dialing 
numbers at random on the telephone, 
throwing tennis balls at the window, 
“helping” his mother in whatever she 
is trying to accomplish, appear to be 
self-sustaining and to bear no traceable 
relation to hunger, thirst, sex, excre- 
tion, or any other biological motive. 
Closely allied to mastery and per- 
haps equally autonomous and unlearned 
is the play motive which man shares 
with the young of most mammalian 
species. In human play, even that of 
the young child, we may note an ele- 
ment of imaginative productivity Or 
creative make-believe that is probably 
absent for the most part in the play of 
animals, though we might have to make 
an exception for such items as the 
imaginary pull-toy of Vickie, the 
chimpanzee raised by Cathy Hayes 
(1951). If we wish a name for this 
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aspect of motivation—which enters into 
many activities besides play—we might 
settle for “fantasy.” Curiosity, mas- 
tery, sensory preference, play are sel- 
dom found on lists of primary motives ; 
but wherever careful developmental 
observation of human children and ex- 
perimental studies of primates are sub- 
stituted for psychoanalytic reconstruc- 
tions and behavioristic extrapolations— 
as in Harlow’s work with rhesus 
monkeys raised in isolation—the evi- 
dence mounts that these anastatic mo- 
tives are unlearned, independent of, 
and co-primary with the biological 
drives. 

Thus Harlow (see Ruch, 1958) re- 
ports that during the first day or two 
of life the monkey invariably evidences 
visual curiosity and exploration. If 
illuminated moving objects are placed 
outside their cage, the young monkeys 
toddle across the cage to them, try to 
grasp them, and failing this stare fix- 
edly at them for minutes or hours. 
Curiosity and mastery are inseparable 
in the monkey too, for, as Harlow notes, 
he will search his cage for anything to 
examine or manipulate. If it is only 
a piece of hanging chain, he will tug it 
or bat it hundreds of times a day. He 
will learn to solve a mechanical puzzle 
for no other reward than the solving of 
it. Far from being derived from 
hunger, curiosity, mastery, and play 
often precede it. Harlow thus observes 
that monkeys first treat solid food as 
toys or pellets to manipulate and toss 
about, and that they may play with it 
for days before they take their first 
bite of it. 

What Harlow calls “contact com- 
fort,” and might preferably be desig- 
nated as a rudimentary form of con- 
trectation, is certainly an example of 
tactual sensory preference that is pres- 
ent in the macaque at birth and so 
commanding in its motivational strength 
that it will at times take precedence 
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even over such “innate” and “pre- 
potent” response patterns as the right- 
ing reflex! Thus each of Harlow's 
monkeys cathects and cherishes the dia- 
per surrogate placed in the bottom of 
his cage. He will sob and carry on, 
sometimes to the point of taking a tan- 
trum, when the cloth is removed for 
laundering. Anyone who has seen a 
human child cling fiercely to a stuffed 
dog or a filthy old blanket (in the man- 
ner of the Peanuts character, Linus) 
will appreciate the similarity. 

Most important perhaps of all Har- 
low’s (Harlow, 1958) now classic re- 
searches is the demonstration that in 
the monkey this search for soft or con- 
trectational objects is far more inti- 
mately related to the development of 
affection than is the satisfaction of hun- 
ger needs. In Harlow’s contrectational 
analogue of the adequate versus the 
inept mother, it is the panda-like 
dummy covered with terry cloth that 
commands the affectionate dependence 
of the young monkey even when it is 
the wire-mesh mother that provides 
the narcissistic supplies by way of a 
built-in bottle. Almost everyone has 
by now seen the films in which the 
young monkey at the first sight of 
danger runs to and clutches his terry 
cloth mother; and if this tactually be- 
nevolent mother is taken away, Har- 
low reports, the baby screams and 
cries most pitiably and is comparably 
overjoyed upon her return. Not so 
with the wire-mesh mother. When 
driven by hunger, the baby monkeys 
feed off her perfunctorily but otherwise 
pay her little heed. It is therefore not 
oral but tactual dependence that pro- 
vides the Anlage for filial affection in 
the young monkey, a cruel blow to 
Freudian speculation if the analogy 
holds for the human. We may soon 
be ready to accept the truism, obscured 
by a generation’s preoccupation with 
the mythology of the libido, that sex too 


298 Raymonp J. McCati 


is more tactual than oral, and even ab 
initio closer to embracing than to 
eating. 

In speaking of the biological drives 
we have stressed the importance of the 
pain-avoidance motive. With regard 
to anastatic motivation it would be rea- 
sonable to add a generalized extension 
of pain avoidance—which we might 
call “safety seeking’”—as a negative 
concomitant. Though it cannot pres- 
ently be documented, it is reasonable 
to assume that the emotion of fear, to 
which we shall advert in another con- 
nection below, is intimately related to 
pain avoidance, and that safety seeking 
may be regarded as the natural prod- 
uct of fear. Perhaps safety seeking is 
an emergent rather than an original 
disposition, but it seems to be suffi- 
ciently natural and inevitable in the 
human being (and probably in the 
higher mammals generally) to be 
classed as a basic motive. 


Socrat MOTIVATION 


Are there any social motives which 
can be regarded as unlearned and au- 
tonomous? With Adler we can say: 
yes, at least one, and that is the primi- 
tive social disposition itself, the posi- 
tive or adient tendency toward other 
human beings, without which the social 
condition of man would be a servitude. 
We may call this motive “affiliation,” 
and note the inevitable sign of its ap- 
pearance in the social smile of the hu- 
man infant, given in response only to 
another human and appearing in all 
normal children before the sixth month 
of life as spontaneously as, though 
earlier than, the plantar reflex. 

If the social smile is interpreted— 
not too rhapsodically, we may trust— 
as a kind of joyful assent to our social 
interdependence, an evidence of the 
natural sociability of man, then we can 
regard the social expression and the 
cultural patterning of our other basic 


motives, biological and anastatic, not 
only as an acquiescent necessity de- 
termined by the physical fact of human 
weakness and dependence, but as cor- 
responding to a psychological need to 
affiliate. 

Though affiliation and dependence 
are as closely related as the two sides 
of the same coin, it is surely a mistake 
to make of dependence itself a motive. 
Such appellations as “dependency 
needs” and “dependency motives” thus 
seem, despite their currency, to rest 
upon a confusion of biophysical fact 
with the psychological disposition that 
makes that fact tolerable. There is no 
call, to be sure, for affiliation to be 
self-sustaining, since the reality of his 
own dependence is brought home to 
the human being very early in his de- 
velopment. It is reinforced by his bio- 
logical need to avoid painful depriva- 
tion of the things he cannot provide for 
himself, by his search for safety, and 
by the ineffectiveness of his very af- 
filiative motive without a reciprocal 
affiliative response from those upon 
whom he is dependent. This recipro- 
cal affiliative response involves at 
least the acceptant approval of others, 
so that our pain-avoidance and safety 
seeking motives and our affiliative dis- 
position may be said to converge in a 
need for approval or assurance of sup- 
port from others. In its social ex- 
pression the corresponding motive, 
representing this convergence, may be 
designated “security seeking” or “the 
security motive.” In the measure of 
our accurate registration of our de- 
pendence each of us knows that with- 
out the approving support of others he 
is next to nothing, It is not surprising 
then that the search for approval or 
security—and perhaps more often the 
avoidance of the opposite—colors vir- 
tually every other motive: determining 
not whether we shall eat and drink 
but what and when, under what con- 
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ditions and circumstances we shall 
sleep and excrete, when we have to 
bear pain, put up with danger and in- 
hibit self-expression ; the limits of curi- 
osity and mastery, the rules of play; 
the channeling of all our efforts toward 
self-actualization into various social 
roles, occupational, sexual, familial, 
recreational. 


ANXIETY 


It seems likely also that it is se- 
curity seeking in this sense that lies 
at the root of that special kind of fear 
reaction which we designate in psycho- 
pathology as anxiety. As every clini- 
cian knows, in its original sense anxi- 
ety refers to a fear whose object is 
nonspecific and difficult to identify, a 
fear, nevertheless, which is at times 
overwhelmingly powerful and always 
unsettling. The Freudians have iden- 
tified it as the basis of most neuroses 
and have in typical fashion ascribed it 
to the birth trauma, the threat of cas- 
tration, the fear of one’s own sexual 
or destructive impulses, and other 
libidinal constructs. Many clinicians 
who have worked with delinquents 
have been struck by the apparent ab- 
sence of this anxiety in the sociopathic 
offender, while those of us who have 
attempted to counsel college students 
(and even those who have but prac- 
ticed the art of self-examination) 
know that anxiety is by no means con- 
fined to the neurotic, the incipiently 
psychotic, and the latently homosexual. 

May not one then argue that the 
only experience which every human 
being undergoes in his life and which 
has at once the generic quality and the 
power to match the paradoxical vague- 
ness-and-intensity of anxiety is the 
experience of social disapproval? 
Though this social disapproval is in 
nearly all cases originally parental, one 
soon comes to dread the disapproval 
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of playmates, teachers, neighbors, and 
finally just of “others,” or perhaps one 
should say, of the Indefinite Other. 

It is plausible to look upon this 
threat of general social disapproval or 
affiliative abandonment as analogous to 
the helpless situation of the newborn 
infant, or even more plausible to see 
it from the perspective of the toddler 
who is faced with what may seem like 
psychosocial annihilation by the threat- 
ened withdrawal of parental support. 
It is also reasonable to suppose that 
the experience of anxiety represents a 
redintegration of the helpless terror of 
the forsaken child; but it seems much 
more parsimonious to view this name- 
less dread as rooted in the early (but 
not paranatal) and general fear of dis- 
approval rather than in particularized 
fears like that of castration, or of one’s 
own unacceptable impulses, or of be- 
ing born again. The latter again seem 
required only by the a priori assump- 
tion that all psychic causality is origi- 
nally concrete and psychosexual and 
in no way demanded by a reasonable 
matching of antecedent with conse- 
quent. 


Socrat MAsTery AND HosTILITY 


There is another side to social moti- 
vation and its emotional reverbera- 
tions, however, which we have so far 
not considered, and this we may call 
social mastery or ascendance. If de- 
pendence and the search for security 
are the obverse of affiliation, social 
mastery or ascendance is its inverse. 
As dependence and security seeking 
lead to submission, so ascendance is 
expressed in aggressive self-assertion; 
and the latter would seem to be simply 
the social extension of the mastery mo- 
tive, so palpable and ubiquitous in the 
2-year-old. It is no wonder, then, that 
human social existence seems eternally 
suspended between the contraries of 
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submission to and domination of 
others. 

Closely related to social self-asser- 
tion is aggressive hostility or the social 
expression of anger. As with anxiety 
and dependency motives, there has 
been much theorizing about the se- 
quential connection between aggressive 
hostility and “frustration,” and a num- 
ber of alleged verifications of this con- 
nection which seem more in the nature 
of laboratory exemplifications of some 
highly circular and equivocal semantics 
than confirmation of testable hypothe- 
ses or even empirical denotations of 
the carefully observed. In the Dol- 
lard and Miller (Dollard, Doob, Sears, 
Miller, & Mowrer, 1939) treatment of 
the “frustration-aggression hypothe- 
sis,” for example, the term frustration 
is at times employed so vaguely as to 
be in effect identical with any negative 
or undesirable state of affairs. It is 
this vague meaning which has carried 
over into popular usage, so that when 
we say in everyday speech that a situ- 
ation is “very frustrating” we often 
mean no more than that we do not like 
the situation, that we are not getting 
from it what we like or liking what we 
get. To argue, in this sense, that 
frustration engenders aggression is 
simply to say that our behavior often 
manifests anger or vigorous dissatis- 
faction with a state of affairs we do 
not like, a proposition of little novelty 
or specific explanatory value. More- 
over, not all dislike leads to anger or 
aggression, and to designate as frustra- 
tion just the particular kind of dislike 
that presages anger is to label in cir- 
cular fashion but not to explain or 
even accurately to describe. What 
produces anger and aggression? Frus- 
tration. What do you mean by frus- 
tration? That which makes us angry 
and aggressive. 

If, on the other hand, we ask people 
to describe the recent situations that 
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have made them angry, we find that 
in less than one-quarter of the cases 
is anger a response to interference with 
our ongoing motivated activity—the 
sort of condition that might correctly 
and univocally be described as frus- 
tration—e.g., heavy rain on the day 
we planned to play golf, our car’s not 
starting, a window’s refusal to open, 
getting poor cards in bridge; but more 
than 70% of reported instances of 
anger reflect disapproval of the be- 
havior or remediable qualities of other 
persons (Cason, 1930) °: the driver of 
another car “cutting us off,” a person 
abusing a child, manifesting bad table 
manners, having dyed hair, driving a 
foreign car, keeping us waiting, or in 
any of a thousand ways failing to meet 
our expectations of or standards for 
the behavior of others. Occasionally, 
too, our anger is directed to ourselves 
(actual) by ourselves (ideal) as when 
we are displeased with our selves for 
having been rude or impatient or 
stupid. 

If our theory of anxiety is correct, 
its Anlage is fear of disapproval by 
others. Might we then not say that 
our anger or hostility is in a sense the 
reciprocal of our anxiety, that it ex- 
presses chiefly our disapproval of 
others? This would seem at least to 
be descriptively more accurate than 
the much touted frustration-aggression 
formula. 


SELF-AWARENESS AND SELF- 
EXPRESSION 


Whether man is moved by biological, 
anastatic, or social goals, whether he is 
progressing toward the accomplishment 
of constructive purposes to the reso- 
nances of such emotions as joy and 
longing, love and hope; or whether his 


3 Informal repetitions of Cason’s study 
with groups of psychology students yield 
very similar percentages, 


INVESTED SELF-EXPRESSION 


purposes are avoidant and defensive, 
laced with anxiety, hostility, suspicion, 
and resentment, it seems that there is a 
proper element in all human motiva- 
tion and that is the element of conative 
self-awareness. To use the term 
“conative self-awareness” is to stress 
the fact that distinctively human mo- 
tivation involves awareness of self not 
only after the operation of the motive 
or independent of the motive, but in 
the motive itself. When I am moti- 
vated in properly human fashion, I am 
aware of myself as motivated, and this 
awareness enables me to evaluate the 
motive in relation to the well-being 
(or values) of that self as I see it. It 
is the evaluation of the motive in re- 
lation to the cognized self that lends to 
the motive its degree of ego involve- 
ment or self-investment, or what All- 
port (1955) calls its propriate char- 
acter. We are moved by some goals 
much more than by others because we 
see them as closer to the needs or val- 
ues of the self. “That hits me,” we 
say, “where I live.” 

There is here no implication that the 
self as cognized is a merely subjective 
resultant, or that it is without its bio- 
logical and social determinants, any 
more than is the so-called “real” self. 
What I see myself to be is naturally 
determined in great measure by my 
abilities (including that of self-objecti- 
fication) and by my emotional disposi- 
tion or temperament, both of which 
are in turn largely determined by bio- 
logical or constitutional forces. And 
even more, as we all know, my self- 
concept echoes the evaluations of what 
the sociologists call the status or power 
figures in my identification groups. 
Robert Burns, somewhat romantically, 
pointed to the desirability and diffi- 
culty of seeing ourselves as others see 
us; but if what Harry Stack Sullivan 
urged, with profounder and equally 
poetic insight, is true, our very self- 
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concept is so largely comprised of the 
“reflected appraisals” of others (Sul- 
livan, 1947), that the greater difficulty 
for us may be to see ourselves as other 
than as others see us! 

At the moment, however, our con- 
cern is less with the etiology of the 
self-concept than with its motivational 
efficacy. One might believe with 
Sherif and Cantril (1947) that the ego 
is entirely the product of social iden- 
tification and that we are ego involved 
only to the extent that our group’s 
values are at stake; or with Freud that 
the ego ideal is identical with the or- 
ally introjected (swallowed) parent, 
deriving all its power from the fear of 
castration at the hands of this parent. 
Even though such explanations may be 
more difficult to swallow than the 
parent in question, there is consola- 
tion in noting that phenomenologically 
at least both the contemporary Freud- 
ians and the social determinists recog- 
nize something like the emergent au- 
tonomy of the self-concept in motiva- 
tion. 

When we proceed beyond the pheno- 
menological toward the explanatory, 
we may be struck by the suggested but 
never clearly defined similarity among 
the views of such diverse thinkers as 
Erich Fromm, Robert Woodworth, 
Kurt Goldstein, Carl Rogers, Gordon 
Allport, Solomon Asch, Leona Tyler, 
and Robert White on the role of the 
self in human motivation. Here the 
Aristotelian notion of self-actualiza- 
tion, to which we referred earlier, 
might be taken as the asymptote by 
relation to which these different ap- 
proaches seem to converge; but there 
is too much of the ontological and the 
absolutist in Aristotle’s view of the 
self to accord with the empirical frame 
of reference of modern psychology. 
When the Aristotelian notion of self- 
actualization is transposed to a less 
metaphysical and universalist key, or 


302 Raymond J. McCati 


to change the metaphor, is placed 
within a probabilistic system of co- 
ordinates, we have what may be called 
a theory of self-expression which 
comes close to signalizing the com- 
munalities in many contemporary self- 
theories. 

In its simplest form the theory of 
self-expression as conceived here con- 
sists of two fundamental propositions 
relative to human motivation: 

1. Behind and sustaining all or vir- 
tually all particular human motives 
there is an élan to maximize; not 
merely to maintain life—though that 
of course is basic to the enterprise of 
maximization—but to live it as fully 
as possible, to develop one’s capacities, 
extend and deepen experience, exer- 
cise one’s powers in the highest; in a 
word to achieve for one’s self the great- 
est possible self-enrichment psychologi- 
cally speaking. 

2. By reason of the limitations of 
his existential situation and in virtue 
of his powers of conceptualization and 
conative self-awareness man is both 
driven and enabled to concretize this 
urge toward the abstract ideal of the 
greatest possible into what he regards 
as a practicable or achievable set of 
goals or values. The ultimate goal of 
the maximum possible is the same for 
all, or at least for most, but the penul- 
timate goals are amazingly diverse. 
This seems to be due to the fact that 
certain objectives (and thereby their 
concomitant motives) come to be iden- 
tified as most instrumentally related 
to or involved in one’s greatest good, 
and for practical purposes—in the 
strict sense of the latter term—to 
symbolize this good. This is an al- 
ternative way of stating the principle 
of ego involvement or self-investment 
enunciated above. I am ego involved 
or self-invested in those goals which I 
have tabbed as indispensable aspects 
or generators of my self-enrichment. 


This principle, as we have also indi- 
cated, is per se independent of whether 
we stress native abilities and constitu- 
tional dispositions as greatly contribu- 
tive to the particularization of the 
greatest good, or rather minimize con- 
stitutional factors and insist that early 
conditioning and social learning specifi- 
cally determine what we value most. 

The principle of self-expression— 
unlike Maslow’s curiously popular ver- 
sion of self-actualization—is not a spe- 
cial kind of motive but an overarch- 
ing principle applicable to human mo- 
tives generally. All, or virtually all, 
characteristically human motives ex- 
emplify the élan to maximize “defi- 
ciency” or homeostatic motives as well 
as “growth” motives. It is obvious 
that to live maximally, one must live, 
and the felt tendencies which we have 
called the categorical drives seem to 
be the motivational expression of this 
elementary fact. But the motives be- 
yond those necessary barely to sus- 
tain life, viz., the pre-emptive drives 
like sex and pain avoidance, the ana- 
static motives like curiosity, mastery, 
and sensory preference, and the affili- 
ative motives in all their ramifications 
reflect man’s restless quest for meliora- 
tion, for “better things.” Man looks 
for the maximum of self-realization not 
merely in the subjective categories of 
comfort and satisfaction but in the 
interactional and objective modes of 
comprehension and control, security 
and status, identification and belong- 
ingness, and in virtue of his ability to 
transcend the present, he looks for the 
perdurance of the good possessed as 
well as for its increase. However nu- 
merous the motives of men, therefore, 
there is only one major principle of 
motivation, as there is, according to the 
late Philip Murray, only one major 
principle of collective bargaining: 
more. 
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THE DOUBLE AGREEMENT PHENOMENON: 


THREE HYPOTHESES 


MILTON ROKEACH?! 
Michigan State University 


Thus far only one hypothesis has been advanced to account for the 
double agreement phenomenon: the “anti-content’’ response set 
hypothesis. The purpose of this paper is twofold: (a) 2 additional “con- 
tent” hypotheses are proposed to account for the fact that Ss agree with 
a statement and with its opposite: S may be telling the truth in the 1st 
instance and lying in the 2nd instance; S may believe both statements 
but remains unaware of the contradiction through an act of compart- 
mentalization. (b) Peabody’s study, which provides the strongest 
evidence put forward thus far for the double agreement phenomenon, is 
re-examined to ascertain to what extent his data are in accord with the 


alternative hypotheses proposed. 


If a person agrees with a statement 
and agrees also with its opposite, 
how shall we interpret such behavior? 

Hypothesis A. Thus far only one 
hypothesis purporting to account for 
such behavior has been put forward: 
the response set hypothesis. Accord- 
ing to this hypothesis the subject is 
not responding to the content. He 
has obviously not comprehended the 
meaning of the opposing statements, 
for if he had, he would not have agreed 
with both. Instead, it is suggested, 
the subject is responding . system- 
atically in terms of some response 
bias such as an underlying predispo- 
sition to acquiesce (Couch & Keniston, 
1960; Jackson & Messick, 1958). 

However, this predisposition to 
acquiesce will not necessarily become 
manifest under all circumstances. 
Cronbach (1946, 1950) and others 
have suggested that two conditions 
which are likely to increase the in- 
cidence of double agreement are 
ambiguity and difficulty of items. 
Another condition which has some- 


1 This paper was written at the Center for 
Advanced Study in the Behavioral Sciences 
and is one of several being prepared with 
support from a grant by the National Science 
Foundation. 


times been mentioned is a “yea- 
saying” item tone or phrasing. 

A major and general purpose of this 
paper is to propose that the mere fact 
of double agreement does not in itself 
constitute acceptable evidence for an 
acquiescent response bias, for the act 
of double agreement can also be 
accounted for on other psychological 
grounds. A more specific purpose is 
to propose that evidence for the 
unusually high incidence of double 
agreement responses to the F, Dog- 
matism, and Anti-Semitism scales 
(Adorno, Frenkel-Brunswik, Levin- 
son, & Sanford, 1950; Rokeach, 
1960), recently reported by Peabody 
(1961), can be accounted for equally 
well, and possibly better, on these 
other psychological grounds. 

Two additional hypotheses will be 
proposed here, to be called Hypotheses 
Bı and Bz, which will also account for 
the double agreement phenomenon. 
In contrast to Hypothesis A which 
says that the subject is not responding 
to the content, Hypotheses Bı and Bz 
are “content” hypotheses. 

Hypothesis Bı. A person may agree 
with a statement and with its opposite 
because in both instances he reads 
them, comprehends them, and re- 
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, sponds to their content. He tells the 
truth in one case because he sees no 
reason why he should not; he deliber- 
ately lies in the second case because 
he sees a good reason why he should 
not tell the truth. Let us assume 
that an anti-Semitic subject is pre- 
sented with two statements, some- 
what separated in time, designed to 
elicit his attitudinal response to Jews. 
The first statement is worded in a 
“pseudodemocratic” manner so that 
its antidemocratic tone will not be 
apparent, and agreement with it is 
indicative of anti-Semitism. Our 
subject reads and comprehends this 
item, and finding nothing objection- 
able about its contents or about his 
agreeing with it, sees no reason to lie. 
So he tells the truth by indicating 
behaviorally that he agrees with the 
statement. But when he is presented 
with the reversed statement he finds 
himself in a somewhat different psy- 
chological situation. He reads and 
comprehends the statement and ac- 
tually disagrees with it. But he sees 
a good reason why he should not tell 
the truth. For example, if he tells 
the truth, he anticipates the experi- 
menter may think him to be a bigot, 
or the experimenter may disapprove 
and apply sanctions. So he agrees 
with the statement. He thus tells 
the truth to the original statement, 
and he deliberately lies to the reversed 
statement. In doing so, he has agreed 
with the original statement and also 
with its reversal.? 

In everyday life similar motivations 
may lead to the double agreement 

2 Hypothesis Bı would also have to include 
the logical possibility that the subject de- 
liberately lies when agreeing with originals 
and tells the truth when agreeing with re- 
versals. However, it is the writer's opinion 
that this possibility is a psychologically mean- 
ingless one and hence less probable than the 
converse, that is, truthfully agreeing with 
originals and untruthfully agreeing with 
reversals. 
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phenomenon. For example, a stranger 
on a train may make derogatory 
statements about Negroes and Jews 
to a companion sitting next to him. 
If such views are challenged by his 
companion or if the companion iden- 
tifies himself as a Jew, the stranger 
may then back off or perhaps even 
express favorable views about Ne- 
groes or Jews, views he doesn't really 
believe, in order to smooth things 
over or to avoid unpleasantries. 

Hypothesis By. A second possibility 
is that a person may agree with 
logically incompatible statements be- 
cause they both represent views he 
really endorses. The logical fact that 
they are contradictory need not 
bother him, since through an act of 
compartmentalization, or isolation or 
“double-think” he can remain con- 
sciously unaware that he is contra- 
dicting himself, thus preserving his 
self-image as a logical and consistent 
person. 

Again, some illustrations may be 
cited. Allport and Kramer (1946) 
found that bigots do not see them- 
selves as bigots but as objective 
and democratic-minded people who 
also express democratic sentiments. 
Myrdal (1944) has made famous the 
notion of the “American dilemma” 
which points to the fact that many 
Americans believe in democracy, fair 
play, and individual freedom and at 
the same time endorse beliefs sup- 
porting segregation. Subjects will 
often agree with double-barreled 
statements in the Dogmatism Scale 
(Rokeach, 1960) in which the second 
part contradicts the first part; for 
example, “The highest form of govern- 
ment is a democracy and the highest 
form of democracy is a government 
run by those who are most intelli- 
gent.” And one will often hear people 
in everyday life spontaneously express 
such contradictory beliefs. 
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In any given instance wherein a 
person agrees with opposing state- 
ments, Hypothesis A and/or Hy- 
pothesis B may be possible explana- 
tions, but to the extent that one of 
these is the more valid explanation 
the other is necessarily less valid, 
because: (a) Hypothesis B asserts 
that the act of double agreement is a 
response to content; Hypothesis A 
denies this. (b) Hypothesis A asserts 
that the reason why a person agrees 
with an original statement is exactly 
the same reason why he will agree 
with its opposite, namely, response 
set; Hypothesis B denies this, as- 
suming instead that the reason for 
agreement with one of the statements 
is not necessarily the same as the 
reason for agreeing with the opposing 
statement. For example, people who 
hold beliefs which are bigoted or 
authoritarian will often deny their 
bigotry or authoritarianism by also 
holding beliefs supporting tolerance 
and democracy. It may be noted with 
considerable interest that even Adolph 
Eichmann, hating and exterminating 
Jews as he did, denied in a Life story 
that he was anti-Semitic and instead 
claimed to be their friend. Similarly, 
the late Senator McCarthy undoubt- 
edly believed himself to be a lover of 
freedom and democracy, and an 
enemy of tyranny. It is possible to 
be bigoted or authoritarian in one’s 
beliefs, and at the same time rational- 
ize such beliefs in such a way that 
one also believes the very opposite, 
because people, including Eichmann 
and McCarthy, need to maintain 
positive self-images. This is where 
mechanisms such as compartmental- 
ization or isolation serve highly adap- 
tive purposes,’ 


3 One possibly unwelcome implication of 
Hypothesis Bə is that high scores on scales 
purporting to measure bigotry or authori- 
tarianism do not distinctively represent such 
antidemocratic views but instead simply 
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In the past few years there has been 
considerable interest and research 
on the role of “response set” in per- 
sonality and attitude scales and con- 
siderable evidence has been put for- 
ward which claims to show the prev- 
alence of response set in such scales. 
Perhaps the most important study 
to appear thus far is one by Peabody 
(1961) who presents highly convincing 
data demonstrating that agreement 
with statements and with their op- 
posites occurs quite frequently, indeed 
far more frequently than anyone had 
previously suspected. He reports 
that subjects who agree with the 
original items on the F, Dogmatism, 
and Anti-Semitism scales (items which 
are all worded so that agreement is 
indicative of authoritarianism, closed- 
mindedness, and prejudice) also agree, 
about 67% of the time, with reversals 
of these items. Peabody assumes, as 
do virtually all other researchers who 
have dealt with the problem (for 
example, Bass, 1955; Chapman & 
Bock, 1958; Christie, Havel, & Seiden- 
berg, 1958; Jackson & Messick, 1958; 
Leavitt, Hax, & Roche, 1955), that 
double agreement necessarily implies 
the presence of acquiescent response 
set (Hypothesis A). Other possible 
explanations such as those presented 


represent high frequency of contradictory 
views. This would probably be more true 1n & 
society where democratic norms prevail along 
with antidemocratic norms (such as in the 
United States) but would probably be less 
true in a totalitarian society (such as in Nazi 
Germany) where norms of tolerance and 
equalitarianism were explicitly discouraged. 
For example, it is to be doubted that Eich- 
mann possessed such contradictory views 
while he was still a powerful figure in Nazi 
Germany. There was probably no need then 
for him to rationalize his antidemocratic views 
by also espousing contradictory democratic 
views. In line with such considerations, the 
writer would conclude that high scores on such 
scales do not represent distinctively contra- 
dictory views but instead antidemocratic 
views. 
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here (Hypotheses B, and Bs) are not 
not considered. 

Let us now examine Peabody's 
findings in greater detail in order to 
see whether Hypothesis B is as capa- 
ble as Hypothesis A of explaining 
his findings. 

1. Peabody finds not only a high 
incidence of double agreement on 
original and reversed items measuring 
F and Dogmatism, but also on original 
and reversed items measuring anti- 
Semitism. This is a most curious 
finding because no one has presented 
evidence or has even seriously claimed 
that Anti-Semitism scale items are 
ambiguous. In contrast there are 
frequent claims that scales like the F 
and Dogmatism scales are ambiguous. 
How account for the fact that high 
frequencies of double agreement are 
found also on anti-Semitism items? 

2. Peabody finds (a) that subjects 
who agree with original items on the 
F, Dogmatism, and Anti-Semitism 
scales agree with reversals about 67% 
of the time, while (b) those who dis- 
agree with original items also agree 
with the reversals, and thus are 
responding to the content, about 85% 
of the time. How are we to account 
for these findings? The latter finding 
particularly is a strange one when we 
note Peabody's claim, which he makes 
without providing the slightest inde- 
pendent empirical support, that re- 
sponse set is a function of ambiguity 
of items. If the items are indeed 
ambiguous it is unlikely that anybody 
will respond consistently to their 
content. More precisely, it is possible 
but psychologically unlikely that those 
subjects for whom the original items 
are ambiguous will generally agree 
with such items, while those subjects 
for whom the original items are un- 
ambiguous will generally disagree 
with such items. And in the event 
that these items are differentially 
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ambiguous to some subjects but not 


to others (a) this fact would have to 
be empirically demonstrated and (b) 
it would have to be further demon- 
strated that those for whom the items 
are ambiguous typically agree with 
the sets of opposing statements while 
those for whom the items are not 
ambiguous typically do not agree. 
Needless to say, a and b have thus 
far not been demonstrated. 

3. Peabody finds lower variances 
and reliabilities for reversed scales 
as compared with original scales. 
Chapman and Bock (1958) also find 
this to be typical of eight other studies 
which they review. Why should this 
be so? 

Answers to all three questions are 
readily forthcoming if we view these 
findings from the standpoint of Hy- 
pothesis B, but the first two questions 
at least provide embarrassment or 
are unanswerable from the stand- 
point of Hypothesis A. Findings 
1 and 2 above become more under- 
standable if we assume that all 
subjects in Peabody's study were 
responding primarily to content, not 
only on the Anti-Semitism scale‘ but 
also on the F and Dogmatism scales, 
and that those who agreed with the 
antidemocratic original items (worded 
in a pseudodemocratic manner so that 
subjects who agreed with them would 
not feel they were responding anti- 
democratically) also agreed with the 
democratically worded reversals for 
reasons already discussed, namely, 
they needed to maintain a positive im- 
age of themselves as decent people. 


4Peabody forces the puzzling finding of 
double agreement on anti-Semitism items 
back into the mold of Hypothesis A by con- 
cluding “a primary cause of agreement set is 
the ambiguity of the original items [p. JON: 
But Peabody presents no evidence to support 
this causal explanation. He presents no data 
whatever concerning ambiguity of items, or 
on the relation between ambiguity and 
double agreement. y 


308 MILTON 


And since such people, those who 


agreed with originals, also on the whole 
agreed with the democratically worded 
reversals they were responding to 
these reversals in much the same way 
as those who disagreed with the 
original items. Thus, variance on 
reversals was necessarily reduced, and, 
consequently, reliability also. This 
accounts for Finding 3 above. How- 
ever, in fairness to those who favor 
Hypothesis A it should be pointed 
out that Finding 3 has also been 
adequately explained by Chapman 
and Bock (1958) who show that 
responses to reversed scales will 
typically have less variance and be 
less reliable than responses to original 
scales if the subjects are responding 
partly to content and partly in terms 
of acquiescence. 5 


CONCLUDING REMARKS 


The main purpose of this paper has 
been to point out that there are at 
least three explanations which may 
account for the double agreement 
phenomenon on personality and atti- 
tude scales. The currently most 
popular explanation (Hypothesis A) 
assumes that subjects are not respond- 
ing to content. The other two ex- 
planations assume that subjects are 
responding to content; they are either 


5 A third explanation which has been offered 
to account for the lower reliabilities of re- 
versed scales is that the designers of reversed 
scales are not as skilled in constructing items 
as the designers of original scales. This ex- 
planation cannot be taken seriously for two 
reasons: (a) Reversed scale designers have 
been at it now for almost a decade and one 
would think that with all the practice and 
experience they have accumulated in the 
interim they would be getting sufficiently 
skillful to construct reversed scales as reliable 
as original scales. This does not appear to be 
the case. (b) The “skilled” designers of origi- 
nal scales (Adorno et al., 1950; Rokeach, 
1956) report that they, too, have been un- 
successful in their efforts to design reversed 
items, 
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deliberately lying to one of the two 
opposing statements (Hypothesis Bı) 
or they are compartmentalizing (Hy- 
pothesis Bz). 

We have also presented reasons why 
we favor Hypothesis B over Hypoth- 
esis A in interpreting Peabody’s 
findings on the high incidence of 
double agreement on the F, Dogma- 
tism, and Anti-Semitism scales. De- 
spite these considerations the question 
as to which of the three hypotheses is 
the most valid is still an open one; 
our argument is suggestive, but by no 
means definitive. 

Further theoretical and empirical 
research is required to investigate 
the personal and situational conditions 
under which Hypothesis A or Bı or Be 
will be the most valid explanation 
for the double agreement phenome- 
non. In the meantime, it is necessary 
to stress that the presence of a high 
incidence of double agreement, even 
if incontrovertible, cannot in itself, 
without additional evidence, be auto- 
matically interpreted to mean that 
the subjects are responding in terms 
of an agreeing or acquiescent response 
set. 

Finally, it may be asked: What are 
the implications of the present anal- 
ysis of the double agreement phe- 
nomenon for the interpretation of 
scores presently obtained on original 
(not reversed) scales measuring such 
variables as F, Dogmatism and anti- 
Semitism, as normally obtained in 
substantive research? Such original 


® Apart from reversal studies, the fact that 
the F and Dogmatism scales are found to 
correlate positively with other scales which 
measure yeasaying (Couch & Keniston, 1960) 
does not, in itself, constitute satisfactory 
evidence that these two scales are measuring 
yeasaying. All that can be safely said is 
that yeasaying is a correlate of such scale 
scores, a finding which reasonably could be 
expected on theoretical grounds, even if the 
F and Dogmatism Scales were completely 
free of yeasaying. 
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scores may be in whole or in part a 
function of (a) acquiescence, (b) 
truthful responses to content, or (c) 
untruthful responses to content. As 
we have shown, the findings regarding 
double agreement do not necessarily 
favor Alternative a. What is needed 
is additional, independent evidence 
showing that the incidence of double 
agreement is a function of item am- 
biguity, evidence which thus far has 
not been forthcoming. As for the 
validity of Alternatives b or c, this 
is a matter that can best be ascer- 
tained when social and personality 
psychologists get back to substantive 
research studies designed to test the 
construct validity of scales such as F, 
Dogmatism, and Anti-Semitism. 
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PRIMARY STIMULUS GENERALIZATION * 
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The purpose of this paper was to examine the widely held assumption 
that there is an empirically based need for a contruct of “stimulus 
generalization.” This was done by (a) outlining its historical evolu- 
tion, (b) discussing its role as a label in secondary sources, and (c) 
evaluating methods employed in measuring it. It was pointed out that 
the construct is based upon some specious premises of early behav- 
iorism, that the way the label itself is employed assumes a unique proc- 
ess, and that the empirical operations for measuring it differ little from 
those employed in discrimination learning. It was concluded that, 
while the methods of measuring stimulus generalization might help to 
define percepts, the assumption of a special process under that label 


contributes little to our knowledge of behavior. 


Over the past 30 years a generic 
set of operations has evolved in the ex- 
amination of behavior, and from the 
results of these operations a number of 
experimenters have inferred a process 
which has been labeled “primary stim- 
ulus generalization.” The specific 
manifestation of this assumed process 
is the occurrence of a response to an 
experimenter event? which differs in 
some way from the experimenter event 


1This paper is an outgrowth of research 
conducted under research grants G-2116 
and G-7463 from the National Science Foun- 
dation to, respectively, J. F. Hall and W. F. 
Prokasy. 

An early version of this paper was pre- 
sented at the 1961 Psychonomic Society 
meeting. The present form has benefited 
from discussions at the meeting, as well as 
from the subsequent comments of colleagues 
at The Pennsylvania State University. 

2We have followed a precedent estab- 
lished by Bush and Mosteller (1955) in 
referring to “experimenter event” rather 
than to “stimulus.” Our intent is to de- 
lineate carefully between those operations 
defined and manipulated by the experi- 
menter, and the results of these operations 
which the subject may or may not perceive. 
For the latter, given the perception, we 
will continue to use stimulus, though this 
particular usage does not resolve all com- 
plications involved in how the word stimu- 
lus has been employed in psychology. For 
a detailed discussion, see Gibson (1960). 


utilized during the acquisition of the 
response. The frequency with which 
the label “stimulus generalization” is 
found in introductory and secondary 
texts, as well as the number of experi- 
mental and theoretical articles devoted 
to it (see Mednick & Freedman, 1960, 
for an extensive literature review) sug- 
gests that the construct plays a central 
role in our attempts to understand be- 
havior modification. 

Unfortunately, the label stimulus 
generalization has been employed in 
so many different ways that, at the 
outset, it is necessary to delineate that 
meaning associated with stimulus gen- 
eralization which we have elected to 
examine, First, we are not concerned 
with theories in which this label is at- 
tached to a parameter or function 
within the theoretical framework 
(eg, Hull, 1943; Shepard, 1957, 
1958). In such instances, the meaning 
of stimulus generalization is (or should 
be) confined to whatever the parameter 
or function does within the theory. In 
this sense it has only indirect bearing 
on what is necessary to assume about 
behavior outside of the theory on the 
basis of existing data. 

Second, though we are concerned 
with the relationship of generalization 
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and discrimination, we do not propose 
to consider an analysis of what under- 
lies generalization phenomena under 
the assumption that, given knowledge 
of discrimination properties in a de- 
fined situation, any departure of re- 
sponse characteristics from expectation 
based upon discrimination properties 
should be called “generalization.” 
Such a position is hazardous, at the 
very least, since it affixes the label 
“discrimination” to hypothetical oper- 
ations the effects of which can be 
shown only through proof of the null 
hypothesis (i.e., proof that only these 
hypothetical limiting operations are 
having an effect in a particular be- 
havioral instance). Furthermore, in 
such an instance the label generaliza- 
tion would become a “junk” category 
and, consequently, virtually meaning- 
less. 

We are concerned, on the other 
hand, with what appears to be the most 
widely accepted meaning implicit in 
the label generalization. This posi- 
tion seems to be that there is an or- 
ganismic process, related to, or a func- 
tion of, stimulus similarity, and exist- 
ing beyond the operations of those 
variables which determine the capacity 
of an organism to discriminate stimuli. 
An implicit, if not explicit, assumption 
is that there are circumstances in 
which the organism does discriminate, 
but, nonetheless, responds because of 
a process, generalization. 

Manifestations of this general posi- 
tion can be detected in at least three 
ways. First, a number of writers have 
pointed to an assumed “biological 
utility’ of generalization (eg., Law- 
son, 1960; McGeoch & Irion, 1952; 
Osgood, 1953; Sidman, 1960). The 
burden of this position is that, since 
two sets of experimenter events are 
never repeated exactly, a “principle” 
of stimulus generalization must be in- 
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voked in order to account for the fact 
that learning occurs. 

Second, some individuals (eg, 
Hovland, 1951; Osgood, 1953; Wick- 
ens, Schroder, & Snide, 1954; Wood- 
worth & Schlosberg, 1954) have ex- 
pressed concern about “the shape” of 
the generalization gradient, again, as 
though there were a process which 
systematically manifests itself in some 
form as a function of stimulus simi- 
larity. 

Third, the acceptance of at least 
something unique in what is subsumed 
under the label stimulus generaliza- 
tion has been sufficiently general: (a) 
to encourage authors of integrating 
texts (e.g, Kimble, 1961; McGeoch 
& Irion, 1952) to treat generalization 
as a phenomenon distinct from those 
encountered in a study of stimulus dis- 
crimination, thus, to some extent, en- 
couraging the idea that independent 
variable operations differentially affect 
what is subsumed under these labels; 
and (b) to provide a frame of refer- 
ence for other findings frequently 
“explained” through the “operation” 
of stimulus generalization (e.g., Battig 
& Bourne, 1961; Champion & Stand- 
ish, 1960; Wickens, Meyer, & Sullivan, 
1961). 

From time to time the value of the 
concept of generalization has been 
questioned (e.g., Lashley & Wade, 
1946), but, by and large, such views 
have not been accepted (see, e.g., Hull, 


3 An interesting omission should be noted 
with respect to what is “meant” when the 
word generalization is employed. Some 
investigators may presume the label itself 
to be essentially meaningless, merely a cus- 
tomary label for a myriad number of factors 
operating to influence discrimination be- 
havior, but we have been unable to find 
sources in the literature which explicitly say 
this. With the poverty of such a usage, one 
can only impute a “meaning” based upon 
what is in the published literature. 
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1947).+ It is the purpose of this 
paper to investigate, further, what 
underlies the belief that there is an 
empirical need for a construct of 
stimulus generalization as representing 
a process which is either affected 
differently by, or by different varia- 
bles than, the variables which affect 
the degree to which an organism 
discriminates experimenter events (or 
stimuli). More explicitly, the posi- 
tion we propose to defend is that the 
organism develops, or makes, discrimi- 
nations based upon limiting neural 
characteristics, specific past training, 
sets, motivations, etc., and that there 
exists in the literature no empirical 
basis for presuming the existence of 
still an additional process, under the 
label of stimulus generalization. To 
the extent that this position is tenable 
there is no need for such a label, if for 
no other reason than that of parsimony. 
Toward our objective, we have divided 
the remainder of this paper into two 
major sections: Historical Influences 
and Assessment. 


HISTORICAL INFLUENCES 


The basis for the contemporary belief 
in a unique process labeled generali- 
zation rests in the original work of 
both Pavlov (1927) and Bechterev 
(1928). Pavlov states: (1927): “... 
if a tone of 1000 d.v. is established as 
a conditioned stimulus, many other 
tones spontaneously acquire similar 
properties, such properties diminish- 


4It should be noted that some investi- 
gators specifically consider generalization to 
be lack of discrimination (e.g., Deese, 1952; 
Gibson, 1959). Furthermore, developments 
in statistical learning theory, which have 
forced greater attention to the way in which 
a subject receives stimulation, are essentially 
a derivative of the common elements theory 
of transfer and, in general, do not differ- 
entiate between generalization and lack of 
discrimination (e.g., Bower, 1959; Bush & 
Mosteller, 1951; Popper, 1959). 


ing proportionally to the intervals of 
these tones from the one of 1000 
d.v. [p. 113].” Bechterev associated 
generalization with the establishment 
of the motor reflex; i.e., during the 
period in which the reflex itself was 
being established, there was assumed 
to be an increase in the number of ex- 
ternal events which could result in the 
occurrence of the partially learned re- 
flex. Following this initial period 
with continued practice, the qualities 
of the specific environmental event em- 
ployed as a conditioned stimulus be- 
come differentiated. Bechterev (1928) 
states: “The period of generalization, 
which concludes the first stage of the 
inculcation of an association-motor re- 
flex, this first stage ending in a greater 
or lesser fixation of the reflex, is that 
period from which differentiation of 
the reflex begins in further incul- 
cation.” 

While Pavlov and Bechterev differ 
on the relative emphasis placed on 
aspects of generalization and differ- 
entiation, it is to be noted that both, in 
general, describe the learning of a con- 
ditioned response in terms of an initial 
occurrence of widespread responsivity 
to many external events, with a grad- 
ual restriction of range with continued 
practice. In addition, though Pavlov 
suggested that the physical similarity 
of the events seemed to be related to 
the degree of generalization, he reports 
that the events eliciting the condi- 
tioned responses were, on occasion, 
markedly distinct from the selected 
CS. The subject, for example, may 
have been conditioned to a specific 
sound, but responses were generalized 
to such events as the entrance of an m- 
dividual into the laboratory setting 
(Pavlov, 1927, p. 115). It is clear 
that a dog has the capacity to distin- 
guish the sound of a bell from the 
sight, sound, and smell of a person en- 
tering the laboratory; but it is equally 
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clear that the same class of response, 
salivation, was elicited by these two 
seemingly disparate classes of environ- 
mental events. 

Pavlov’s research, particularly, gave 
rise to a number of studies of general- 
ization in this country, several of the 
notable early ones being those of Bass 
and Hull (1934) and Hovland (1937a, 
1937b, 1937c, 1937d). While the 
theme of these researches was an ex- 
amination of Pavlov’s theory of irradi- 
ation, the guidelines for future work 
in generalization were formed. The 
degree to which the idea of generaliza- 
tion was incorporated into the think- 
ing of the 1930’s is amply illustrated 
by the following quotation from Hull 
(1934) : 


It is important to observe that the tend- 
ency to generalization is of primary impor- 
tance in biological economy, since without 
it organisms would need to undergo sepa- 
rate conditioning in order to react to every 
slightest variation in the conditioned stimuli 
which, strictly speaking, are never exactly 
alike on any two occasions [pp. 446-447]. 


Hovland’s research (1937a, 1937b, 
1937c, 1937d), in particular, drew at- 
tention to the problems of scaling the 
physical dimension of stimuli across 
which generalization was examined. 
In plotting response strength as a func- 
tion of jnd units of frequency, in- 
tensity, etc., a precedent, in wide use 
today, was established: the physical 
attributes of the training event defined 
the directions, and dimensions, over 
which generalized responding was 
measured, and each subject was pre- 
sumed to have a built-in psychophysi- 
cal scale, determining discriminability, 
across the physical dimension. 

During this particular span of time, 
the behaviorist movement was at its 
peak influence, and one result of this 
influence, in our opinion, was the im- 
plicit adoption of two research prem- 
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ises both of which have contributed 
much to our present-day conception of 
stimulus generalization. The first was 
a strong emphasis upon the external 
physical definition of an effective stim- 
ulus. The experimenter defined the 
environment of the subject in terms 
of the experimenter’s manipulation of 
the physical attributes of an external 
event. The second premise was that 
of the S-R bond. That is, a response 
was attached to a set of physical ener- 
gies and only those energies could, 
presumably, elicit the response. Since, 
however, physical energies other than 
the ones to which the subject had 
been conditioned also elicited the con- 
ditioned response—particularly when 
it appeared that the two sets of physi- 
cal energies were discriminable to the 
subject—a principle to cover what 
would otherwise be an exception to the 
S-R bond was required. : 
These legacies of behaviorism and 
their effect on research in generaliza- 
tion cannot be overemphasized. Con- 
sider, for example, some concluding 
remarks Hovland (1937b) made in 
one of his classic papers on generali- 
zation: “Quite marked... is the 
augmented size of the response to the 
intensity farthest removed from the 
conditioned stimulus on the first cycle 
of testing. . . . No satisfactory expla- 
nation is available at present.” Since 
the principle of irradiation could not 
account for intensity gradients, Hov- 
land found what is called generaliza- 
tion, but could not account for it. 
Moreover, that the specific test stimuli 
were separated by 25 jnd’s or more 
encouraged the belief that they were 
discriminated. In other words, with 
the premises of physically defined ef- 
fective stimulus dimensions, discrimi- 
nation units (jnd’s), and specific S-R 
bonds, it did not occur to Hovland, 
at least in print at that time, that 
the subject might not have been re- 
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sponding in accordance with such 
premises. 

In addition to the foregoing, the 
theorizing of both Spence (1936, 
1937) and Hull (1943, Ch. 12) tended 
to establish stimulus generalization as 
something distinct from discrimina- 
tion. For Spence (1936) discrimina- 
tion learning was derived from the 
algebraic summation of excitatory and 
inhibitory gradients of generalization. 
These gradients operated across physi- 
cally defined dimensions, and were 
functions of the psychophysical laws 
relating the organism to the physical 
dimension. For Hull, generalization 
was an exponential function of a jnd 
scale across physically defined dimen- 
sions. These theories accentuated the 
scaling and the physical dimension 
aspects of research in generalization, 
and in this way tended to encourage 
an empirical acceptance of a theoreti- 
cal construct. 

Brown, Bilodeau, and Baron (1951) 
were among the first to treat stimulus 
generalization as strictly an empirical 
construct, presumably independent of 
any necessary theoretical connotations. 
In brief, their position is that when an 
organism has learned to give a con- 
ditioned response to a particular ex- 
perimenter event, it can be demon- 
strated that other events will also elicit 
the response even though these other 
events have not been employed in the 
conditioning context. It is, basically, 
this position which later integrating 
texts have, in general, incorporated 
into discussions of discrimination and 
generalization. 


ASSESSMENT 


Irrespective of whether or not there 
exists a process for which we require 
a unique label, generalization, the 
contemporary usage of an empirical 
position such as that adopted by 
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Brown, Bilodeau, and Baron (1951) 
has a subtle, but powerful, effect on 
the maintenance of a belief in a distinct 
process unique to what we subsume 
under the label generalization. A 
subject is described as learning to re- 
spond to an event, and then as gen- 
eralizing. Examples of this mode of 
description are many. For example, 
Mowrer (1960) states: “. . . tested to 
see to what extent a rat . . . would 
generalize . . . [p. 38, italics ours]”; 
Deese (1952) states: “. . . subjects 
seemed to generalize . . . [p. 67, ital- 
ics ours]”; and McGeoch and Irion 
(1952) state: “Primary stimulus gen- 
eralization depends upon . . . inherent 
dimensionalization . . . [p. 68, italics 
ours].” The point is that the phe- 
nomena resulting from the set of oper- 
ations are interpreted in terms of an 
active process on the part of the sub- 
ject, i.e., the subject is doing some- 
thing (generalizing, dimensionalizing). 
In brief, the very language by which 
certain operations in transfer and 
learning are described carries with it 
the implicit assumption that there 1s 
an underlying active process associate 
with the results of the operations by 
which stimulus generalization is de- 
fined: the word generalize is a verb. 


The Role of the Stimulus 


The contemporary treatment of gen- 
eralization derives, it appears to us, 1 
part from the ambiguous status of the 
nature of the effective stimulus, as well 
as from premises about the stimulus 
or effective stimulus implicit in the 
behaviorist movement, A number © 
psychologists have been concerne 
with problems inherent in the use of, 
and meaning implicit in, the wor 
“stimulus,” and, among them, Gibson S 
(1960) discussion is particularly 
thorough. Certain of these problems 
merit some amplification here. 
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An important aspect of most gen- 
eralization studies lies in a definition of 
the conditioned stimulus as well as of 
those events which serve as test stim- 
uli, As noted earlier, the trend to 
define effective stimuli in terms of 
physical dimensions was present with 
Pavlov’s early research, and has con- 
tinued to the present day. Thus an 
effective stimulus is, in many contem- 
porary usages, a physical event of a 
particular amount, intensity, duration, 
etc., selected by the experimenter and 
defined as the effective stimulus 
(which, thereby, defines relevant stim- 
ulus dimensions) for the subject. 

There is certainly no reason to as- 
sume, however, that the experiment- 
er’s delineation of a physical event and 
the subject’s perception of it are iso- 
morphic, What represents an impor- 
tant dimension of the physical event 
for the experimenter may not even 
exist as part of the effective stimulus 
for the subject. Similarly, the sub- 
ject may perceive aspects of an ex- 
perimenter event which have been ig- 
nored by, or are unknown to, the 
experimenter. 

Another problem concerns the as- 
sumptions related to the concepts of 
stimulus dimensions and the afferent 
generalization continuum, this latter 
concept being the differential afferent 
response which corresponds in varying 
degrees to variation in a given stimu- 
lus continuum. We are not concerned 
as to whether there are innately de- 
termined stimulus dimensions,’ but 


5 Experimental evidence in this area is 
ambiguous. A study by Ganz and Riesen 
(1962) supports the notion that there is in- 
nate dimensionalization, though their pro- 
cedure of combining data across days tends 
to obscure the horizontal “gradient” ob- 
tained on the first session. On the other 
hand, Peterson (1962) found complete gener- 
alization across a wide series of wave length 
values in ducks reared in an environment of 
a narrow band of wave lengths. In either 
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rather, given a stimulus dimension, we 
would question the assumption that 
variation in a subject’s responses found 
in the typical stimulus generalization 
experiment must be accounted for in 
terms of the postulation of a special 
process (generalization) rather than in 
terms of how set, attention, etc. affect 
the manifestation of discrimination. 

Moreover, we would further ques- 
tion the general assumption that the 
natural unit of measurement for the 
afferent generalization continuum is 
the jnd. That we have established 
psychophysical scales is not to be de- 
nied; rather it is to be pointed out 
that such scales are the product of ex- 
tensive training of experienced ob- 
servers. Subjects in psychophysical 
investigations have been trained to at- 
tend to a single aspect of an experi- 
menter event; i.e., all responses to 
other attributes of the event are, as 
much as possible, extinguished. The 
resulting scales may approximate the 
limits of sensory functions but this, if 
true, says nothing about the attributes 
of the external world initially selected 
and perceived by the subject. Subjects 
trained and tested in investigations of 
generalization are not systematically 
told to what to attend, and, as a num- 
ber of investigators have suggested 
(eg., Jenkins & Harrison, 1960; Reyn- 
olds, 1961), organisms may attend to 
only one of several aspects of the en- 
vironment which are consistently 
present. 

Under these circumstances we see 
no defensible rationale for plotting 
“generalization gradients” as a func- 
tion of a psychophysical scale. Such 
a comparison—between a scale de- 
termined by specific training to a sin- 
gle attribute of a preselected stimulus 
and a gradient obtained following sin- 
eS Ser ee ea 


event, the results carry no specific limita- 
tions on our position concerning stimulus 
generalization. 
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gle experimenter event training—has 
no special meaning unless it is assumed 
that the single event training has, in- 
deed, forced the subject to attend to 
the attribute measured in the psycho- 
physical judgment situation. Such an 
assumption, of course, would be 
strictly tongue-in-cheek and gratuitous 
unless discrimination operations were 
carried out: such operations, however, 
by the accepted standards of measuring 
stimulus generalization are not per- 
missible. 

If it is acknowledged that the effec- 
tive stimulus in a homogeneous labo- 
ratory environment is not necessarily 
known to the experimenter, and when 
it is recognized that subjects behave 
as though they select, attend and de- 
velop sets (Hebb, 1949, pp. 1 ff.) ; then 
it is difficult to know how to interpret 
the behavioral fact that a subject can 
be trained to perform a response in the 
presence of Event A and may, subse- 
quently, also perform it in the presence 
of Event B. It is clear, however, that 
a special process of generalization need 
not be invoked. For instance, what 
some authors have interpreted as the 
“biological utility” of stimulus general- 
ization might be reinterpreted along 
the lines that the changes in physical 
events which necessarily occur with 
successive event presentation may not, 
so far as the subject is concerned, con- 
stitute a change at all. That is, no 
biologically useful process need be im- 
puted just because imperfect repro- 
duction of successive events results in 
the “same” response—the effective 
stimulation could be the same despite 
imperfect reproduction, 

Because of these problems in de- 
fining the stimulus, it should be noted, 
further, that the so-called empirical 
definition of stimulus generalization 
entails more than empirics. As a spe- 
cific example, the manner in which the 
typical “empirical definition” is stated 
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is such as to carry with it the assump- 
tion that a change in experimenter 
event is a change in the effective stim- 
ulus in at least one of its attributes. 


The Problem of Discriminability of 
Stimuli 


That some of the results which have 
been labeled stimulus generalization 
can be interpreted in terms of a failure 
on the part of the organism to dis- 
criminate between the training and test 
events has been rejected on the 
grounds that in a number of studies 
the confusion of stimuli seems to have 
been improbable. For instance, Kim- 
ble (1961) has cited four studies 
(Bass & Hull, 1934; Brown, Bilodeau, 
& Baron, 1951; Hovland, 1937b; Lund- 
holm, 1928) which, he suggests, are 
difficult to force into a failure of dis- 
crimination explanation. We will ex- 
amine three of these, omitting the 
Lundholm study since the major prob- 
lem for that investigator was to study 
artificially induced anesthesia—an area 
considerably divorced from our major 
concern. 

Brown, Bilodeau, and Baron (1951) 
required subjects to respond to the 
lighting of seven spatially distinct 
lamps, but not to react to the lighting 
of any of the others. Instructions em- 
phasized that subjects (a) should re- 
act as quickly as possible and (b) 
should not be unduly concerned with 
false responses. Gradients of response 
frequency were found as a function of 
the spatial separation of the lamps. 
Presumably, the spatial positions were 
completely discriminable. 

Our point in this experiment is that 
the instructions given to the subjects 
to react as quickly as possible and to 
ignore false responses seriously attenu- 
ates any discrimination operation that 
the subjects would make in this situ- 
ation. Had the instructions empha- 
sized accuracy, and unlimited time tO 
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respond, we would predict that no 
gradients would have been obtained. 
In this respect, the authors stated: “If 
the subjects had been instructed to take 
plenty of time in responding . . . it is 
relatively certain that no false re- 
sponses would have been observed.” 
In brief, that events are discriminated 
under some sets of circumstances does 
not mean that they are discriminated 
under others. 

Hovland (1937b) obtained almost 
complete generalization of the GSR 
with test tones which were 50, 100, and 
150 intensity jnd’s from the training 
tone, Slivinske and Hall (1960) ex- 
amined the discriminability of tones 
which Hovland used and found that 
they were not absolutely discriminable. 
It must be acknowledged, however, 
that Slivinske and Hall did observe 
that the lack of discriminability was 
restricted primarily to tones immedi- 
ately adjacent to the CS, and that their 
subjects were able to discriminate the 
tone farthest removed from the CS 
with almost 100% accuracy. 

Two further points are relevant, 
however. First, although psycho- 
physical techniques revealed that sub- 
jects could discriminate easily among 
the varying tones, discriminability 
among them was reduced considerably 
when they were placed in an abso- 
lute judgment situation. Second, that 
there was little or no decrement in re- 
sponding as a function of the distance 
of the test tone to the CS suggests that 
the subject had learned to respond 
only to a tone with little regard to the 
loudness dimension. Thus, there was 


6 The use of intensity as a dimension re- 
sults in a complication. 
has pointed out, 

Intensities lower than the CS will lead to 

lessened response strength because of gen- 

eralization decrement and the directly 
weakening effect of the lower intensity. 

With higher intensities, these two factors 


As Kimble (1961) ` 
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no assurance that the experimenter’s 
specification of the effective stimulus 
was isomorphic with the subject's 
perception. 

The third study cited by Kimble 
is that of Bass and Hull (1934) who 
obtained spatial generalization gradi- 
ents using the conditioned GSR, Our 
analysis of the authors’ findings is 
similar to that which we have pre- 
sented for the Hovland study. We 
would suggest that the subjects had 
learned to respond to an attribute of 
the conditioned stimulus (sudden ex- 
ternal change) which was not con- 
sidered by the experimenters, 

In general we believe that in many 
of the generalization studies, and par- 
ticularly those employing the GSR, 
changes in the environmental situ- 
ation produced by CS presentation 
provide a stimulus attribute or dimen- 
sion to which the organism may re- 
spond. The studies of Kimmel (1959) 
and Prokasy, Hall, and Fawcett (1962) 
illustrate the difficulty of separating 
the effects of sensitization and condi- 
tioning procedures in the GSR situ- 
ation, 

Such a problem is not peculiar to 
GSR studies, however. Wickens and 
Wickens (1942) had one group of rats 
first learn an escape response to a 
sudden shock onset. When the rats 
subsequently were exposed to sudden 
versus gradual onset of light, sudden 
light onset produced more escape re- 
sponses than did gradual light onset. 
Similarly, training rats to escape grad- 
ual shock onset resulted in more fre- 
quent responses to gradual light onset 
in subsequent tests. Here we would 
infer (as we would with Pavlov’s 
data, described earlier) that onset 
characteristics of an event may con- 


act against each other. ... In classical 
conditioning, the bulk of the evidence sug- 
gests that the intensity function is more 
important than the generalization function. 
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trol, at least in part, various attributes 
of responding. Again, generalization 
need not be invoked. 

The question arises as to how such 
an explanation can account for the de- 
creasing response magnitude plotted 
as a function of the distance between 
test stimuli and CS sometimes reported 
in the literature. We know, for exam- 
ple, that changes in the environment 
may induce altered response levels 
(Grings, 1960) and that the introduc- 
tion of different physical events can 
induce other response characteristics 
(such as orienting responses). There 
is no reason why such changes cannot 
suppress, partially, a response that has 
been initiated, or why such changes, 
related to the past history of the organ- 
ism, cannot induce responses where, 
ordinarily, one might not have been 
expected. 


Methodological Considerations 


Inferences about stimulus generali- 
zation are usually made from response 
measurements obtained on extinction 
or test trials which follow a number 
of learning or conditioning trials, 
Whether or not such measures reflect 
a generalization process, rather than 
discrimination, assuming both to exist, 
will depend upon the degree to which 
the response measures are not con- 
taminated by phenomena not under 
investigation. 

Probably the most frequently used 
operation for examining generalization 
is an extinction situation in which all 
of the test stimuli are presented for 
a number of test trials. Thus, Hov- 
land (1937a, 1937b) trained subjects 
for 16 conditioning trials to a particu- 
lar CS and then presented extinction 
trials in which the original CS as well 
as three test stimuli were presented in 
a balanced order four times each. 

A slight variation of this technique 
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is to utilize a variable interval rein- 
forcement schedule (e.g., Guttman & 
Kalish, 1956) and measure the rate of 
responding to the varying test stimuli, 
The use of such a reinforcement sched- 
ule has the effect of enabling the ex- 
perimenter to obtain a large number 
of responses during the test series. 

In contrast to varying the number 
of test stimuli presented during the 
extinction situation, some investigators 
have presented only a single stimulus 
during the test trials. Thus, Wickens, 
Schroder, and Snide (1954) admin- 
istered 16 conditioning trials and fol- 
lowed them with the administration of 
eight test trials in which only a single 
stimulus was presented. Different test 
stimuli were employed with different 
subjects. Thompson (1958) has em- 
ployed essentially this technique in 
avoidance conditioning. In contrast to 
Wickens, Schroder, and Snide (1954), 
however, Thompson counted as his 
index of generalization the number of 
test trials up to and including the first 
failure of the subject to avoid in the 
presence of the test stimulus. 

The final technique” is that of test- 
ing only once with a single test stimu- 
lus immediately following the condi- 
tioning trials, a method utilized by 
Grant and Schiller (1953) and Hall 
and Prokasy (1961). Grant and 
Schiller have urged this form of test- 
ing on the grounds that it avoids com- 
plications of extinction effects and 
potential discrimination training. 

Each of the delineated techniques 
has limitations in providing a basis for 
valid inferences about the relationship 
of response strength to the event di- 


1We recognize that other techniques for 
examining generalization have been utilized 
(e.g., Grandine & Harlow, 1948; Lashley & 
Wade, 1946). These techniques, however, 
provide a discrimination task for the subject 
on test trials which, we believe, places them 
in a somewhat different category than those 
cited. 


PRIMARY STIMULUS GENERALIZATION 


mension being examined. Those in- 
volving multiple test stimuli provide 
a situation in which the subject has 
the opportunity to compare and con- 
trast the stimulus used in the condi- 
tioning trials with the test stimuli. 
Moreover, it is assumed that the ex- 
tinction operations do not interact 
with the effects of varying the test 
stimuli, the tenability of which has yet 
to be established. That this procedure 
does affect the gradients obtained is 
amply illustrated by Hiss and Thomas 
(1963). 

A necessary assumption with the 
single test trial method is that the re- 
moval of the reinforcement (or UCS 
in classical conditioning) does not in 
some way affect the response measure 
obtained on the test trial. Whether or 
not this is true may depend upon the 
characteristics of the response meas- 
ure. Where the GSR is used, for ex- 
ample, its latency is such that on test 
trials following training with an in- 
terstimulus interval of 500 millisec- 
onds, the subject has time in which to 
exhibit an altered response contingent 
upon removal of the UCS. Grings’ 
(1960) recognition of such GSR 
changes has led him to invoke a per- 
ceptual disparity process. 

Finally, a general assumption made 
for all methods which incorporate ex- 
tinction test trials is that responses 
made during extinction reflect ade- 
quately those central nervous system 
changes which are presumed to have 
taken place during conditioning trials. 
The validity of this assumption must 
be questioned since a number of studies 
have demonstrated that extinction 
measures have little relationship to 
other indices of response strength. In 
the last analysis, the validation of such 
an assumption must await adequate de- 
scription of what takes place during 
the extinction trials. 

A final methodological consideration 
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is that the operations by which gen- 
eralization is defined differ from those 
of discrimination training only in that 
there is no experimenter-administered 
differential reinforcement of the two or 
more experimenter events under ex- 
amination. The omission of differ- 
ential reinforcement on test trials 
might provide a clue to subject-defined 
stimulus equivalence, but it does not 
appear necessary to invoke a special 
label for an assumed process (general- 
ization) just because this operation is 
omitted, 


Concluding Comments 


It is our conclusion that the infer- 
ence that two events are discriminated 
is something about which the experi- 
menter has little substantial knowl- 
edge. That the subject makes the 
same response to two events may not 
mean, in two steps, that the events 
were discriminated and then that the 
subject generalized. If the events 
were not discriminated, the conclusion 
that generalization occurred is un- 
warranted. The assumption of dis- 
crimination, then, automatically pro- 
vides the illusion of something else: 
generalization. 

If, on the other hand, there is some 
rationale for believing that the subject 
has discerned a shift in experimenter 
event, yet responds (see Hiss and 
Thomas, 1963, who observed system- 
atic latency increases as training and 
test stimuli differed), whether or not a 
distinct process, labeled generalization, 
need be invoked depends upon how one 
interprets the role of other variables 
which can operate. For example, if, 
in changing from training to test 
event, the subject detects a change in 
the external environment, responses 
to this change may occur (such as ori- 
enting responses; Sokolov, 1960) 
which in turn may alter or delay the 
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characteristics of the CR. This, how- 
ever, need have no direct bearing on 
whether or not the attribute of the 
external event, though changed, is 
perceived as having been changed. 
The point is that a subject may detect 
environmental changes, which in turn 
affect other response characteristics, 
but this need not mean that for him 
the effective stimulus has been altered. 

When a subject, in a laboratory set- 
ting, behaves in a way which tells us 
either that two events lead to the same 
response, or that two events lead to a 
similar, but altered, response, we are 
confronted with the simple fact that 
there is a prelaboratory history of per- 
ceptual organization (either genetically 
given, behaviorally shaped, or verbally 
instructed), a training situation and a 
test situation. To label one class of 
myriad contributing factors as dis- 
crimination, or the effects upon dis- 
crimination, and to label another class 
as generalization, or the effects upon 
generalization, seems to us to be an 
unnecessary complication in behavior 
analysis. 

Thus, that the label stimulus gen- 
eralization, as representing something 
unique from that represented by “dis- 
crimination” and such operationally 
distinguishable factors as motivation, 
attention, orienting reflex, etc., can be 
challenged in terms of the inadequacy 
of experiments to differentiate proper- 
ties unique to generalization, as well 
as in terms of the language by which it 
is described, casts serious doubt on the 
usefulness of this construct. Our posi- 
tion is that it is more parsimonious to 
refer to discrimination, or lack of dis- 
crimination, and to the effects of ma- 
nipulated variables on discrimination. 
Recognizing the limitations of the 
usual generalization tests, one might, 
however, conceive of the better tests of 
generalization as a set of operations 
which serve to tell us about which 
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aspects of the environment control re- 
sponse variance. Such techniques may 
enable us to “map” the perceptual 
world of the subject under specified 
prelaboratory and laboratory environ- 
ments. 
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The possibility is examined that grammatical structure is acquired by 
“contextual generalization" —a type of generalization which results 
from a S's learning the position of a unit in a sequence. Several experi- 
ments, in which S's learn miniature artificial languages, demonstrateand 
explore certain aspects of this type of generalization in verbal learning. 
From the experiments a theory of the learning of grammatical structure 
is developed. Confrontation of the theory with facts about the struc- 
ture of natural languages, especially English, suggests that it may be 
tenable, provided its scope is narrowed in ways discussed, and provided 
that an additional assumption is made—that as well as learning the loca- 
tions of units, Ss form paired associates whose foci are closed-class 
morphemes, i.e., articles, auxiliaries, affixes, etc. 


Just how virtually every human 
child contrives to learn his native 
language probably constitutes the 
most arresting mystery in psychology. 
A salient feature of the development 
is the relative rapidity with which 
the complexity of sentence structure 
increases during the initial period of 
acquisition. The purpose of the 
present paper is to explore the poten- 
tialities of the concept, ‘‘contextual 
generalization,” for explaining the 
acquisition of grammatical structure, 
especially those aspects of grammati- 
cal structure which have to do with 
word order (which constitute much 
of the grammar in English). 

For verbal learning, contextual 
generalization may be defined in- 
formally as follows: when a subject, 
who has experienced sentences in 
which a segment (morpheme, word, 
or phrase) occurs in a certain position 


1The writer is indebted to the Superin- 
tendent of Schools, and to the Principals of 
Woodlin and Rosemary Hills elementary 
schools, of Montgomery County, Maryland, 
for their cooperation in making available 
subjects and school facilities for Experiments 
I—IV. For Experiment V a similar debt is 
owed to James Hymes of the University of 
Maryland nursery school. 


and context, later tends to place this 
segment in the same position in other 
contexts, the context of the segment 
will be said to have generalized, and 
the subject to have shown con- 
textual generalization. 

Thus defined, contextual general- 
ization falls within the general rubric 
of stimulus and response generaliza- 
tion. One speaks of “stimulus gen- 
eralization’’ when a subject, who has 
learned to make a certain response 
to a stimulus S,, later makes the same 
response to a new stimulus Sə which 
is like Sı. In stimulus generalization, 
the mediating property (the way in 
which Sy is like S;) is usually conceived 
to be some intrinsic property of S, 
and Sb», e.g., color, shape, etc. Simi- 
larly, in response generalization, the 
mediating property of Ri and Rs is 
usually thought of as some intrinsic 
property of the responses. Although 
the properties mediating generaliza- 
tion are usually intrinsic ones, there 
seems to be no particular reason why 
this should be so, and contextual 
generalization appears to be a special 
case where the mediating property is 
an extrinsic one, namely, temporal 
location in an utterance. 
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There are various suggestions in the 
literature that learning word order is 
similar to learning the associative 
connections manifest in word associa- 
tion tests. According to an early view 
of Miller (1951), “controlled associa- 
tions are quite similar to the choice of 
successive words in speech [p. 186 ].” 
Consistently with this idea, Miller 
leaned heavily on the concepts of 
“contextual constraint” and “‘transi- 
tional probability” —“By grammatical 
habits we mean the operations of 
contextual constraints upon the se- 
quence of symbols [p. 185].’’ While 
these concepts have been shown to be 
relevant to some verbal learning and 
memory experiments, no one has tried 
to develop these ideas to the point 
at which they can be taken seriously 
as a theory of the learning of gram- 
matical structure. Perhaps this fail- 
ure reflects the difficulties of such an 
enterprise. 

A central difficulty of a theory 
based on learned associations between 
words is its clumsiness in handling 
generalization phenomena that obvi- 
ously occur in verbal behavior. Thus, 
if we are told, for example, that 
PEOPLE KIVIL EVERY DAY, we are 
likely to deduce the existence of a 
verb TO KIVIL, and to accept sentences 
like GEORGE KIVILS as grammatical. 
Similarly, sentences like IRON FLOATS, 
or COLORLESS GREEN IDEAS SLEEP 
FURIOUSLY (Chomsky’s 1957 ex- 
ample), would probably be recognized 
as having the structure of English 
even though it is doubtful that any of 
the words have ever previously been 
associated with each other in the 
experience of most English speakers; 
certainly such sentences would be 
considered more grammatical than 
WE ARE GOING TO SEE HIM IS NOT 
CORRECT TO CHUCKLE LOUDLY AND 
DEPART FOR HOME, in which the 
associational bonds between the words 


are quite strong (Miller, 1951, p. 81). 
It seems obvious therefore that there 
must exist generalization mechanisms 
in language learning whereby a word 
learned in one context generalizes to 
another context, even though no 
associations may have previously 
formed between the word and its new 
context. 

In any explanation of the learning 
of grammatical structure it seems to 
the writer that some such generaliza- 
tion mechanism will have to occupy a 
central position. The present paper 
explores whether contextual general- 
ization is a serious candidate for this 
role. 

The paper is divided into two parts. 
The first part reports a series of ex- 
periments in which children learn 
some miniature artificial languages 
with nonsense syllables as words. 
Contextual generalization is first de- 
monstrated as a phenomenon, and 
then various problems associated with 
the concept are explored. From the 
experiments, the general lines are 
sketched of a theory of the learning of 
grammatical structure based on con- 
textual generalization—based, that is, 
on the notion that “What is learned” 
are primarily the proper locations of 
words in sentences. 

Although experiments with artificial 
languages provide a vehicle for study- 
ing learning and generalization proc- 
esses hypothetically involved in learn- 
ing the natural language, they cannot, 
of course, yield any direct information 
about how the natural language is 
actually learned. The adequacy of a 
theory which rests on findings in work 
with artificial languages will therefore 
be judged by its consistency with data 
on the structure and development of 
the natural language. In the second 
part of the paper an attempt is made 
to confront the theory developed with 
known facts about the grammatical 
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structure of natural languages, es- 
pecially English, so as to discover the 
limitations of the theory, the stum- 
bling blocks it faces, and the resources 
it can draw upon to meet the stum- 
bling blocks. 


EXPERIMENT I: THE A + P 
LANGUAGE WITH WORD 
CONSTITUENTS 


This experiment demonstrates, for 
a very simple language, that the 
position of a word in a verbal array 
can be the “functional stimulus” 
mediating generalization. 


Method 


Description of the language. There were 
two classes of words, A words and P words, 
and sentences were always two words long 
and consisted of an A word followed by a P 
word. The words were low-association value 
nonsense syllables. Kıv, JUF, and FOJ were 
the A words, and BEW, MUB, and yaG the 
P words. Two words of each class were used 
during the initial learning, and the third was 
introduced in generalization trials. 

Procedure. The subject was told that he 
was going to play a sort of word game in 
which he would learn a bit of a new language 
which might seem strange because he would 
not know what any of the words meant. 
The words (written on 1.5 X 3 inch cards) 
were shown to him and a consensus about 
their pronunciation was reached. 

The “language” was taught through a 
series of sentence-completion problems. A 
word was presented on the ledge of a board, 
either preceded or followed by a vacant 
position; A words were always presented on 
the left, and P words on the right of the 
vacant position. In each problem, the 
subject was given two words, one of each 
class, to choose from to complete the sentence 
by placing his selection in the vacant position. 
Before each problem the subject was asked 
“Do you remember how this one goes?” or 
“How do you think this one should go?” 
If he chose the correct word he won a poker 
chip (eight poker chips were worth a choco- 
late); if he chose wrongly he was shown the 
correct answer. After each problem, the 
correct sentence was read aloud by him, and 
repeated by the experimenter. 

In the initial learning two A words (KIV 
and Jur) and two P words (BEW and MUB) 
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were used. Four sentences can be formed 
from these, and eight sentence-completion 
problems can be constructed since each 
sentence can be formed either by filling in the 
first word or by filling in the last word. The 
initial learning had two stages. Four of the 
eight problems were first selected and were 
presented in random order until learned to a 
criterion of seven successive correct responses 
(not including the first presentation of the 
first problem, which was used to demonstrate 
the procedure to the subject). Then the 
remaining four problems were presented once 
each, followed by all eight problems in 
random order until correct responses to 
seven successive presentations were made. 

After the initial learning four generalization 
problems were each presented once to discover 
whether the subjects had registered the posi- 
tions of the words used in the initial learning. 
A new A or P word was presented, with the 
alternatives always being words used in the 
initial learning. The generalization problems 
were (with the alternatives in parenthesis): 
Foy — (KIV, BEW); — YAG (MUB, KIV); 
— YAG (JUF, BEW); FOJ — (MUB, JUF). 

Subjects. The subjects were 16 children, 
aged 9-6 to 10-5, 8 boys and 8 girls. Non- 
readers were excluded. 


Results 


The initial learning was accom- 
plished quite rapidly, two subjects 
making no errors whatever following 
the initial demonstration problem. 
As a demonstration of contextual 
generalization, the main interest of 
the experiment lies in the performance 
on the generalization problems. In 
78% of these problems the subjects 
filled the vacant position with the 
word that had occupied this position 
in the initial learning. Using the 
binomial expansion, one would expect 
by chance that, among 16 subjects, 
5 subjects would respond correctly 
on either all four or three of the four 
generalization problems, 6 subjects on 
two problems, and 5 subjects on one 
problem or none. The obtained 
figures were 12, 4, and 0 subjects, 
respectively. The tendency to re- 
spond correctly is highly significant 
(x? = 15.5, df = 2, p < .001). 
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A vacant second position was filled 
in correctly in 91% of the problems 
(13 subjects correct on both problems, 
and 3 subjects on one or none), as 
compared with 66% for a vacant 
first position (8 subjects on both 
problems, and 8 subjects on one or 
none). Tested by chi square for 
correlated proportions, the difference 
between the positions is not sta- 
tistically significant. 

After the generalization problems, 
the subjects who had correctly com- 
pleted at least three of the four 
sentences were asked why they picked 
the one they did. The two most fre- 
quent explanations were “It sounded 
right,” and “I remembered” (when 
asked what they remembered these 
subjects said they did not know— 
further questioning usually elicited 
that they had not realized that all the 
sentences in the generalization prob- 
lems were new). Only one subject 
said anything to indicate that he had 
noticed the constant positions of the 
words. These explanations suggest 
that the subjects responded un- 
wittingly to the positional cue and 
that this cue was temporal rather than 
spatial in nature, and auditory rather 
than visual (i.e., the temporal position 
of the word in the sentence when read 
aloud or rehearsed subvocally, rather 
than its left or right position in the 
visual display). Further evidence 
for this is provided in a later ex- 
periment. 


Conclusion 


The results indicate that subjects 
who have experienced sentences in 
which words occur in a certain posi- 
tion and context tend to place these 
words in the same positions in new 
contexts. Such behavior indicates 
the learning of an association of words 
with their: “positions, the context 
generalizing. A suggestion as to the 
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mechanism of contextual generaliza- 
tion is contained in the subjects’ 
explanations of their responses: per- 
haps the repeated experience of a 
certain word in a certain position 
makes it sound familiar, and therefore 
sound “right,” in this position, even 
though the original context be 
changed. 


Discussion 


Extrapolating from the above con- 
clusion, it is possible to hazard a 
guess about the infant’s learning of 
grammatical word order. Perhaps he 
constantly hears the same expressions 
recurring in the same positions in his 
verbal environment; these therefore 
come to sound familiar and therefore 
“right” to him in these positions, and 
consequently in his own language he 
reproduces the same positional rela- 
tionships. Such a theory would make 
the learning of grammatical word 
order a special case of ‘‘Gibsonian” 
perceptual learning (Gibson & Gib- 
son, 1955), i.e., a process of auditory 
differentiation, or of becoming fami- 
liar with, the temporal positions of 
expressions in utterances. Perceptual 
learning is usually assumed to be a 
rather primitive process and there is 
therefore no reason to suppose that it 
demands much in the way of intel- 
lectual capacity in the learner. Learn- 
ing of this sort would therefore satisfy 
at least one requirement of any 
process postulated to be involved in 
first language learning, namely, that 
it not require intellectual capacities 
obviously beyond the reach of the 
2-year-old. 

The most immediate problem in the 
above line of thought concerns the 
definition of position. In the learning 
of a language more complex than the 
one used in this experiment, two 
closely related questions™arise. One 
question concerns whether it is the 
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absolute positions of expressions (e.g., 
first, second, etc.) that are learned, or 
the positions of expressions relative to 
other expressions. A less obvious but 
probably more important question 
concerns the nature of the elements 
that fill positions. It is perhaps most 
natural to assume that the word is 
the principal element whose position 
is learned. If the word is the only 
element whose position is learned, 
then for English, position must be 
defined relatively and not absolutely, 
since almost any English word can 
occur in any absolute position in a 
sentence. However, some exploratory 
work suggested that the relative 
positions of words were rather difficult 
to learn. It seemed, therefore, that it 
might be fruitful to question the 
assumption that there is anything 
inevitable about the word as the sole 
or principal element whose position 
is learned. 

An alternative assumption is that 
the elements are hierarchically or- 
ganized : a sentence would be assumed 
to contain a hierarchy of elements in 
which longer elements (e.g., phrases) 
contain shorter elements (e.g., words) 
as parts, with the positions that are 
learned always being positions within 
the next larger element in the hier- 
archy. A hierarchy of elements would 
require that expressions of any length 
can be elements whose position is 
learned ; words (or morphemes) would 
be merely the smallest elements in the 
hierarchy. At each level in a hier- 
archical scheme position could be 
defined either absolutely or relatively. 
A simple example of a hierarchical 
scheme in which position is defined 
absolutely is provided by binary 
fractionation. In this scheme a 
sentence contains just two positions, 
a first and a last, and each expression 
in these positions can itself contain 
two positions, a first and a last, and 
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each resulting expression is potentially 
divisible in turn in like manner, etc. 
It may be noted that this method of 
defining position seems relevant to 
English structure, e.g., the verb in 
English generally constitutes the first 
part of the last section of a sentence. 
A hierarchical organization of ele- 
ments is assumed in immediate con- 
stituent analysis in structural lin- 
guistics (cf. Chomsky, 1957). 


EXPERIMENT I]; THE A + P 
LANGUAGE WITH PHRASE 
CONSTITUENTS 


This experiment was designed to 
investigate whether the ease and 
effectiveness of learning is related to 
the way in which position is defined 
for the subject. The learning of the 
same language was compared under 
conditions (a) when words were the 
units and the positional cues were 
relative, and (6) when phrases (either 
one or two words long) were the units 
and the positional cues were absolute 
—first and last. It was expected that 
Condition b would prove simpler. 
The initial learning procedure was 
designed to match that of Experiment 
I, so that the learning scores could be 
compared with those for the ap- 
parently simpler language of Ex- 
periment I. 


Method 


Description of the language. The language 
was that of Experiment I, except that an 
additional word (GED) always preceded two 
of the three A words (Jur and roy), and 
another additional word (row) always fol- 
lowed one of the P words (BEW). Sentences 
could thus vary from two to four words in 
length, and the A and P words were sometimes 
first and second, and sometimes second and 
third; the relative ordinal positions were, 
however, the same as in Experiment I, i.e., 
A words (KIV, JUF, and Foy) always im- 
mediately preceded P words (BEW, MUB, and 
YAG). 

An alternative way of specifying this 
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language is to say that any of three A phrases 
(KIV, GED JUF, GED FOJ) could be followed by 
any of three P phrases (BEW POW, MUB, YAG). 
When described in this way the grammar is 
an exact replica of the grammar taught in 
Experiment I, but with phrases instead of 
words as the immediate constituents. 

Procedure. The procedure had four parts: 

initial learning, second learning, learning test, 
and recall test. To match Experiment I, the 
initial learning used only two A terms (KIV, 
and jur preceded by GED) and two P terms 
(BEW followed by pow, and mus). Using a 
sentence completion procedure as before, the 
language was taught in two ways. For one 
group of subjects (Group 1) only the position 
of the A or P word was left vacant in a 
problem, and the choices available for 
sentence completion were always an A anda 
P word. (For this group the two new words 
GED and pow never had to be filled in.) For 
another group of subjects (Group 2), the 
sentences were presented in each problem 
lacking the whole A or P phrase (i.e., either 
one or two words as the case might be), and 
the choices available for completing the 
sentence were always a whole A and P phrase. 
(When the choices were of unequal length, 
these subjects of course were not given any 
cues as to the length of the sentence to be 
constructed.) The initial learning was exactly 
parallel to the initial learning in Experiment I: 
to every sentence completion problem used 
then there corresponded a problem for each 
group in this experiment, and the order of 
administration of problems and the learning 
criteria were the same. Subjects who failed to 
complete the initial learning in 60 trials were 
not included in the remaining parts of the 
experiment. 

The purpose of the second learning was to 

introduce and provide practice with the A 
term (FOJ preceded by GED) and the P term 
(vac) not used in the initial learning. Nine 
problems combined these new items with 
previously used words or phrases, and with 
each other; in a further six problems, each A 
and P word (or phrase) occurred twice, once 
as the correct alternative and once presented 
in the sentence for completion. In the second 
learning, as in the initial learning, Group 1 
filled in A and P words only, and Group 2 
the full A or P phrase. 

In the learning test, each of the eight 
sentence completion problems of the initial 
learning were administered in turn to both 
groups with only the A and P words to be 
filled in. For Group 1 the learning test was 
simply a repetition of the same eight problems 
they had already learned to criterion. 
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In the recall test, the eight words of the 
language were handed to the subjects and 
they were asked to try to make a complete 
sentence of the language that they remem- 
bered. The request was repeated until they 
had offered four sentences. 

Subjects. In each group there were 12 
children, aged 9-7 to 10-7, 6 boys and 6 girls. 
Nonreaders were excluded. 


Results 


The first three lines of Table 1 
show the learning scores for the two 
groups and for the subjects of the 
previous experiment. Since some 
subjects failed to reach the learning 
criterion, the scores are expressed in 
terms of the median and range. It 
can be seen that there is no difference 
between Group 2 and the subjects 
of Experiment I in either trials-to- 
learn, or errors. 

Using the nonparametric Mann- 
Whitney sum-of-ranks test, it was 
found that Group 2 learned in fewer 
trials (z = 2.7, p < .01), and made 
fewer errors (z = 1.7, p < .05 for the 
one-tailed hypothesis), than Group 1. 

In the eight problems of the learn- 
ing test, there was only one error 
among the 12 subjects of Group 2, 
whereas 6 of the 10 subjects in Group 
1 who took the test made one error 
apiece (x? = 4.5, p < .05). Subjects 
who had learned by filling in the whole 
A or P phrase therefore knew the 
relative positions of the individual 
words of the phrase even better than 
the subjects who had been trained to 
fill in the individual words. 

In the recall test the subjects of 
Group 1 constructed an average 0 
2.6 sentences, as against 3.1 for 
Group 2. The results favor Group 2, 
as predicted, but the difference is not 
significant (t = 1.2). The difference 
between the groups might well have 
been greater if the experimenter had 
not foolishly limited the subjects to 
four sentences, thus imposing a ceiling 
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TABLE 1 


LEARNING AND ERROR SCORES IN THE VARIOUS EXPERIMENTS 


Percent 

Experiment N subjects 

learning 

I 16 100 
Ibr 12 83 
EA 12 100 
III 24 100 
IV: 1 6 33 
IV: 2 14 64 
Vv 12 100 


Trials to learn Number errors 
Mdn. and range Mdn., and range 
10.5 (0-28) 4 (0-7) 
24 = (5-60+-) 5 (2-20+) 
9.5 (2-20) 3.5 (1-9) 
11 (0-40) 3 (0-14) 
50+ (29-50+) 20+ (10-20+-) 
32 (4-50+-) 9 (3-20+) 
13 (0-55) 4.5 (0-18) 


on the recall scores to the probable 
detriment of the group with the higher 
mean score! It should also be re- 
membered that the two worst subjects 
in Group 1 were excluded from the 
recall test because they failed to 
complete the initial learning. 


Discussion 


The similarity between the learning 
scores for Group 2 and for the lan- 
guage of Experiment I indicates that 
it matters little whether the elements 
in first and last position are words or 
variable length phrases (one or two 
words). 

The results also indicate that the 
same language is learned more easily 
and effectively when the response 
units are phrases of variable length 
and the positional cues are first and 
last, than when the response units are 
words and the positional cues are 
provided by the relative positions of 
the words to each other. This result 
is probably due to the greater in- 
formational economy of a first-last 
dichotomy; that is, to learn the 
absolute positions of the phrases fewer 
items of information would have to be 
registered than to learn the positional 
relationships between the words taken 
individually. Consider, for example, 
the complexity of a relative definition 
of the position of BEW: BEW precedes 


pow; it follows either KIv, JUF, or 
FOJ, whichever of these appear; and 
it occurs next but one after GED, 
when GED is present. Obviously it is 
much simpler to take BEW POW as a 
unit and state only that this occurs 
last. (Within a hierarchical scheme 
the position of BEW could then be 
specified as the first member of a 
BEW-POW unit.)? 

The results of this experiment 
therefore suggest that learning is 
easier with a simple definition of 
position, and that a variable length 
element does not impair learning. 
Such a result would be expected 
under the assumption that position 
can be defined through a hierarchical 
scheme, since a hierarchical scheme 
permits a simple definition of position, 
but does so at the cost of a complex 
element. 


2Qne way (which would apply only to a 
language more complex than the one taught 
here) in which the complexity of a relative 
definition of position could be reduced, would 
be to arrange that a few elements recur in a 
very large number of sentences of the lan- 
guage. The subject might quickly learn to 
recognize these elements and they might then 
serve him as reference points in the sentence; 
the positions immediately preceding or fol- 
lowing such elements would then be defined 
in a fairly simple manner. The familiar 
element would, so to speak, serve as a “tag” 
(cf. the discussion of “‘closed-class” mor- 
phemes, below). 
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Experiment III]: THe A + PQ 
AND AB + P LANGUAGES 


The experiment was designed to 
explore the learning of locations in a 
language with a hierarchy of elements. 
Suppose a grammar in which a double 
binary fractionation of sentences is 
possible (i.e., not only a division of 
the sentence into an A and a P phrase, 
but also a further subdivision of at 
least one of the phrases into two 
parts). Such a grammar would be 
exemplified in a language with four 
A phrases, ai, az as, a4 (not neces- 
sarily of equal length), any of which 
may precede any of four P phrases, 
Pidi, P22, P12, P2di. The P phrase 
has internal structure, consisting of a 
P word (pı or ps) followed by a 
Q word (qı or q). With such a 
language, it was predicted that the 
subjects would not only associate 
phrases with their position within the 
sentence, but also associate words 
with their position within the phrase 
—contextual generalization should oc- 
cur both between and within phrases. 

For the above language, between- 
phrase generalization would be de- 
monstrated in essentially the same 
way as in Experiment I. Ata certain 
point in the learning, an A and a P 
phrase (eg., perhaps ay and poqi), 
not yet experienced by the subject, 
would be introduced and presented in 
problems of the form a,() and 
( )peqi. If the subject has registered 
the positions of the A and P phrases in 
the previous learning, he should com- 
plete these sentences correctly, to 
form, e.g., aapiqi, &1P2q1. 

However, the more complex gram- 
mar brings a new factor up for con- 
sideration, not present in Experiment 
I. Ina language in which some of the 
phrases have internal structure, the 
component phrases of most new 
sentences will be formed by joining 
words not previously combined, and 
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not represent new additions to the 
vocabulary. When new phrases are 
recombinations of familiar elements, 
any associations of a paired-associate 
type, which may have formed during 
the previous learning, will have an 
opportunity to facilitate generaliza- 
tion. In the case of the language 
outlined, where only one phrase has 
internal structure, such associations 
could facilitate solution only of gen- 
eralization problems in which the un- 
structured phrase is to be filled in. 
Thus in problems of the form aa( ) in 
which the structured phrase has to be 
filled in, a4 is a new addition to the 
vocabulary so that no associations 
can have formed to it; whereas in 
problems of the form ( )p2q1, associa- 
tions could have formed between the 
familiar unstructured A phrases to be 
filled in and the words p: and qı of 
the phrase presented. Analysis of 
performance on such generalization 
problems may therefore give some 
indication of the extent of associative 
effects in this kind of generalization. 
Within-phrase generalization would 
be demonstrated in problems of the 
form anps( ) and an( )qs, where an is 
any A phrase (familiar), and ps and q3 
are new words. If, in the previous 
learning, the subject has registered 
the positions of the P and Q words in 
the P phrase, he should complete the 
sentences correctly, to form, €g- 
AnP3q1, AnPigs. Since an can be any 
of the four A phrases, a generalization 
problem of the above form, €g» 
anps( ), can be replicated four times, 
with aj, as, as, a4 serving in turn as the 
A phrase; in each replication a correct 
response will construct the same P 
phrase (e.g., psqi) in the context of a 
different A phrase. One can therefore 
ask whether the association of words 
with their within-phrase positions is 
demonstrated on the first problems, 
and also whether there is an increase 
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in generalization with successive A 
phrase contexts. An increase might 
be due either to an increase in the 
strength of association of the word 
with its within-phrase position or to 
between-phrase contextual general- 
ization of the new P phrase from one 
A phrase to another. 


Method 


Description of the languages. The two 
languages are defined in Table 2. In the 
AB + P language the structured phrase was 
in first position, in the A + PQ language in 
second position. 

Procedure. The procedure had five parts: 
initial learning, second learning, between- 
phrase generalization test, third learning, and 
within-phrase generalization tests. Since the 
languages are mirror images, the procedure 
will be described only for the A + PQ 
language; the procedure for the AB +P 
language was an exact analogue—read A in 
place of P, B in place of Q, and P in place of 
A in the following description. 

The initial learning was an exact problem- 
by-problem analogue of that used in the two 
previous experiments. Krvit and OB ORDEM 
were the A phrases, and MERVO SOM (piqi) 
and YAG EENA (poqz) the P phrases. The 
subjects always filled in the whole phrase, 
in the same way as Group 2 in the previous 
experiment. The initial learning, then, was 
comparable to the first experiments, two A 
phrases and two P phrases with, as yet, no 
elaboration of internal structure. 

The second learning introduced a new A 
phrase (as: REMIN GICE) and a new P phrase 
(piq2: MERVO EENA). The first nine problems 
combined these new phrases with the previ- 
ously used ones and with each other; then a 
random assortment of problems constructible 
from the six phrases were learned to a criterion 


TABLE 2 
A + PQ LANGUAGE 


A Phrase P Phrase 
Kivi (a1) MERVO (pı) som (qi) 
OB ORDEM (az) | YAG (pe) EENA (q2) 
REMIN GICE (as) | LECK (ps) | WIMP (qa) 


Noor (a4) 


Note.—A pndn sequence constitutes a P phrase, and 
an A-phrase followed by a P phrase is a sentence. The 
AB + P language is obtained by interchanging A and 
P phrases of the A + PQ language. 
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of seven successive correct responses. The 
second learning thus introduced a new A 
term, unrelated to the previous A terms; the 
P term, however, was systematically gen- 
erated by the combination of elements from 
the previously learned P terms. 

The between-phrase generalization test 
consisted of six problems presented once each. 
The problems introduced a new A phrase 
unrelated to previous A phrases (ay: Noor) 
and a new P phrase generated by recombina- 
tion of elements (p:q1: YAG SOM). These were 
presented three times each in the positions 
appropriate to their class, and in each problem 
a different one of the previously used phrases 
had to be filled in. In the three problems in 
which yac som was the phrase presented, the 
incorrect choice was always MERVO EENA 
(because otherwise the subjects might well 
solve these problems by adopting the hy- 
pothesis that the same word could not occur 
twice in a sentence). 

In the third learning 15 problems construct- 
ible from the eight phrases were presented, 
most of which contained the two phrases just 
introduced in the generalization test. The 
purpose of the third learning was to familiarize 
the subjects with these new phrases. 

The within-phrase generalization problems 
differed from all previous parts of the pro- 
cedure in that sentences were presented lack- 
ing one word only, and in that subjects had 
tochoose from threeinstead of twoalternatives 
(because there were three parts of speech). 
Preceding the first generalization problem 
there were three “buffer” problems with these 
characteristics which contained no new 
words. There were 16 within-phrase gen- 
eralization problems; these were arranged 
in four sets, each set comprising four problems. 
In two problems of each set, the P phrase 
presented was ps (LECK) followed by the 
space to be filled in; in the other two problems 
it consisted of qs (wimp) preceded by the 
blank; the choices were always a P word, a 
Q word, and a word selected from one of the 
A phrases; each of pi, ps, 41, q2 were correct 
once. In the first set of problems the A 
phrase used was always a1, in the second set 
as, in the third as, and in the fourth a4. Thus, 
in the first set the problems were: KIVIL LECK 


—— (ORDEM, SOM, MERVO) ; Kivi —— WIMP 
(YAG, EENA, REMIN); KIVIL —— WIMP (som, 
GICE, MERVo); Kivi. LECK —— (ORDEM, 


YAG, EENA). In the second, third, and fourth 
sets of problems, in addition to replacement 
of the A phrase, the order of presentation of 
problems containing particular P/phrases was 
varied, as also were the wrong alternatives. 
To give the subjects additional experience of 
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the sentences used in each set of generalization 
problems before going on to the next set, 
between sets the subjects were given eight 
problems made up from the four sentences 
introduced in the set just completed. In 
these practice problems, the whole A or P 
phrase had to be filled in, as in the earlier 
learning. 

If two sessions were required to complete 
the procedure, the break between sessions 
always occurred during the second learning. 

Subjects. Twenty-four children, aged 9-11 
to 11-1, served as subjects. Each language 
was learned by 6 boys and 6 girls. Nonreaders 
were excluded. 


Results 


Table 1 shows the initial learning 
scores for the 24 subjects, who were 
treated as a single group since the 
scores were very similar for the two 
languages. It can be seen that the 
scores are almost exactly the same as 
those found in Experiment I and for 
Group 2 in Experiment II. 

In 74% of the between-phrase 
generalization problems, the subjects 
filled the vacant position with the 
phrase which had occupied this posi- 
tion in the previous learning. Using 
the binomial expansion, one would 
expect that among 24 subjects, 8.25 
subjects would be correct on either 
zero, one, or two of the six problems, 
7.5 subjects correct on three problems, 
and 8.25 subjects correct on four, five, 
or six problems. The obtained figures 
were 2, 3, and 19 subjects (x? = 21.8, 
p < .0001). Since, as described in 
the procedure, three of the problems 
had the same wrong alternative 
(MERVO EENA), there was the possi- 
bility that subjects might have solved 
some of these by avoiding this couplet. 
This possibility is ruled out by the 
fact that the subjects made slightly 
fewer errors on the first of the three 
problems than on the other two, and 
also that they did not make more 
than the average number of errors 
on the one problem on which this 
couplet was correct. 
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It will be remembered that in the 
AB +P language the structured 
phrase was in first position, and in 
the A + PQ language in final position. 
The 12 subjects who learned the 
AB + P language correctly placed the 
structured phrase in first position in 
83% of the 36 relevant problems 
(3 problems per subject); they cor- 
rectly placed the unstructured phrase 
in final position in 69% of the prob- 
lems. The corresponding figures for 
the subjects who learned the A + PQ 
language were 75% for the un- 
structured phrase in first position, 
and 69% for the structured phrase in 
final position. These four percentages 
do not seem to differ from each other. 

On the first set of within-phrase 
generalization problems the subjects 
filled 68% of the vacant positions 
with the words that had previously 
occupied these positions (as against 
the 33% expected by chance). Among 
24 subjects one would expect from the 
binomial expansion that 14.2 subjects 
would make errors on all or all but one 
of the four problems, and that 9.8 
subjects would make two or more 
correct responses. The obtained fig- 
ures were 4 and 20, and the difference 
is highly significant (x? = 18.0, 
p < .0001). Despite the small num- 
ber of subjects for each language, the 
tendency to respond according to 
word position on the first set of 
within-phrase generalization problems 
is significant for each language in- 
dividually (by a similar chi square 
test using Yates’ correction). 

In Figure 1 the results on all four 
sets of within-phrase generalization 
problems are presented, showing for 
each set the proportion of times that a 
given within-phrase position was Cor- 
rectly filled. It can be seen that 
correct responses occurred somewhat 
more frequently on the later sets than 
on the first, suggesting that some 
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Fic. 1. Proportion of sentences in each set 
of four within-phrase generalization problems 
in which a given position was correctly filled 
(e.g., the open squares indicate how often the 
P position in the A + PQ language was filled 
with a P word). 


generalization from one sentence con- 
text to another occurred over and 
above the within-phrase generaliza- 
tion manifested by the better-than- 
chance success on the first set. Com- 
paring the within-phrase positions 
with each other, Figure 1 suggests 
that the tendency to associate a word 
with its position may be greatest for 
the Q position in the A + PQ lan- 
guage (i.e., for the last word in the 
sentence), and about the same for the 
remaining three positions. 

When asked how they knew what 
to pick, three subjects said something 
about position (e.g., “I don’t know 
exactly; this one usually starts it”); 
the remaining subjects showed no 
insight. The most frequent answer 
made some reference to the sound of 
the sentence (eg., “I don’t know; 
I just put it because it sounds right”) ; 
a number of subjects said that they 
“remembered,” and it was obvious 
that most subjects were quite unaware 
that many of the new sentences were 
new. One subject said she “tried 


them all out and picked the one that 
made the most sense” ! 


Discussion 


The similarity of the initial learning 
scores to those in Experiment I, and 
for Group 2 in Experiment II, in- 
dicates that ease of learning is not 
related to such factors as number of 
words per phrase, number of syllables 
per word, exact phonetic or lexical 
constitution of the words, etc., at 
least within the limits that these 
factors were varied. Similarly, the 
results on the between-phrase general- 
ization problems confirm and extend 
the observations of Experiment I, 
in that they provide evidence that 
phrases as well as words tend to 
become associated with the sentence 
positions in which they recur, and 
thus to generalize to fresh contexts. 
Both these findings indicate that, 
within fairly wide limits, the con- 
stitution of the elements in first and 
last position is not an important 
variable for either learning or gen- 
eralization. 

It was noted in the introduction to 
the experiment that paired-associate 
links, formed between phrases and 
parts of phrases during learning, 
might act to facilitate solution of the 
between-phrase generalization prob- 
lems in which the unstructured phrase 
was to be filled in. However, no 
evidence was found for such a facilita- 
tive effect, since the unstructured 
phrase was not filled in correctly more 
often than the structured phrase. 

The results on the within-phrase 
generalization problems indicate that 
during learning the component words 
of the structured phrase become 
associated with their positions within 
the phrase. This finding confirms the 
assumption that elements can be hier- 
archically structured for the subject, 
that is, the locations with which 
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expressions become associated can be 
locations within longer expressions 
whose location in the sentence is also 
learned. 

This experiment completes the ex- 
periments which explore ways in 
which the structure of the verbal 
array may define positions for the 
subject. In Experiment IV the 
question of the role of the oral-audi- 
tory cycle in this kind of learning is 
given further consideration. 


EXPERIMENT IV: POSITIVE AND 
NEGATIVE INSTANCES 


It was suggested earlier that the 
relevant positional cue was the tem- 
poral position of the word in the 
sentence as spoken, and not the left- 
right position in the visual display. 
It was also suggested that, since only 
the correct sentences were read aloud, 
the subjects had much more op- 
portunity to become familiar with the 
sound of the correct sentences, their 
experience of ungrammatical sentences 
being confined entirely to subvocal 
rehearsal of the wrong answers to 
problems. In choosing between the 
alternatives on a given problem, the 
subjects would therefore be able to re- 
hearse each alternative and select the 
one that sounded “right,” i.e., fa- 
miliar. Even in the subvocal rehearsal 
it is likely that the correct alternative 
would be rehearsed more often than 
the incorrect one, since if the subject 
happened to try the correct alter- 
native first it would sound right, and 
the incorrect alternative might not 
then be rehearsed. 

In ordinary discrimination-learning 
experiments, matters are usually ar- 
ranged so that the subject has equal 
experience with both the positive and 
negative stimuli. In the previous 
three experiments, if the subjects’ 
method of solution has been correctly 
interpreted, there was very unequal 


experience with correct and incorrect 
sentences, exposure to positive in- 
stances greatly predominating. First 
language learning by infants, it may 
be noted, occurs through exposure to 
positive instances almost exclusively. 

The following modifications of the 
procedure used in Experiment I were 
made both to test the above inter- 
pretation of the subjects’ method of 
solution, and also to explore the 
possible importance for language 
learning of the distinction between 
learning to discriminate positive from 
negative instances, and learning pri- 
marily through exposure to positive 
instances. It was predicted that 
learning would be seriously impaired 
if the training procedure were altered 
so that the subjects had nearly equal 
familiarity with correct and incorrect 
sentences. Two procedures were de- 
signed, each modifying Experiment | 
in a slightly different way. The 
language in each case was the A + P 
language with word constituents, de- 
scribed for Experiment I. 


First Modification 


Procedure. The preliminary instructions, 
the rewards, the learning criteria, the sequence 
of presentation of problems, and the actual 
words used in each problem were exactly the 
same as in Experiment I. The difference lay 
in the fact that each problem was not pre- 
sented as a sentence completion problem; 
instead the correct and incorrect sentences 
involved in a problem were each written on 
separate cards, and were presented to the 
subject one on his left and one on his right. 
He was instructed to read each one aloud and 
then to point to the one which “could be said 
in this language.” He was told whether he 
was right, but neither sentence was read aloud 
following his response. 

Six children, aged 9-6 to 10-7, served as 
subjects. 


Results. The results are shown in 
Table 1 (Row IV:1). Only two sub- 
jects reached the learning criterion, 
and both these required more trials 
and made more errors than any sub- 
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ject in Experiment I. This procedural 
modification obviously impairs learn- 
ing very severely indeed. 


Second Modification 


Procedure. While the preceding modifica- 
tion probably gave the subjects equal exposure 
to the sound of the correct and incorrect 
sentences, it also altered the visual display, 
and the precise nature of the response 
demanded of the subject. The modification 
now described maintained the same visual 
display and the same response as in Ex- 
periment I. 

The procedure was an exact duplicate of 
that in Experiment I, except that in each 
problem before completing the sentence, the 
subject read aloud the sentences which would 
be produced by substituting each alternative 
in turn in the vacant position. After com- 
pleting the sentence he was told whether he 
was correct, and if incorrect was shown the 
correct sentence, but he did not then read it 
aloud. If he read it aloud spontaneously he 
was told not to; the experimenter did not 
read the sentence aloud either. After a 2- 
second pause the next problem was set up. 

Since each problem terminated with the 
correct sentence exposed visually to the 
subject, and since literate subjects have a 
tendency to read to themselves written words 
they see, there was probably somewhat more 
exposure (subvocal) to the correct than to the 
incorrect sentences. However, there was 
certainly far less differential exposure than in 
Experiment I. 

The subjects were 14 children, aged 10-1 to 
10-10. 


Results. The results are again 
shown in Table 1 (Row IV: 2). While 
the impairment of learning appears to 
be a trifle less serious with this pro- 
cedure than with the preceding one, 
learning is obviously greatly impaired 
relative to the procedure used in Ex- 
periment I. For both trials-to-learn 
and errors, the difference is highly 
significant (p < .001, and .0001, re- 
spectively, by the Mann-Whitney 
sum-of-ranks test). 


Discussion 


Since the difference between the 
second modification and the procedure 


used in Experiment I involved only 
the reading of the sentences, the 
results strongly support the previous 
interpretation of the basis of the 
subjects’ responses. That is, the 
relevant cue was the temporal position 
in the spoken sentence, and as learning 
progressed words or phrases came to 
sound familiar in the positions in 
which they recurred. The subjects 
were then able to respond correctly in 
generalization problems by picking 
the alternative which made the sen- 
tence “sound right.” 

It may be added that the deleterious 
effect on grammatical judgments of 
self-exposure to negative instances is 
a phenomenon within ordinary ex- 
perience. For example, if a foreigner 
asks whether DIFFERENT TO, DIFFER- 
ENT THAN, Or DIFFERENT FROM is 
correct in English, or whether a 
sentence like THE CHILD SEEMS SLEEP- 
ING is any less correct than THE 
BOOK SEEMS INTERESTING, one is 
usually much more confident of the 
answer if one responds at once than if 
one repeats each alternative 20 times 
before responding. The very act of 
repeating the ungrammatical (or less 
grammatical) sequence a number of 
times seems to make it momentarily 
“sound right,” and thus removes the 
usual basis for the judgment. 


EXPERIMENT V: THE A + P 
LANGUAGE WITH 
4-YEAR-OLDS 


Experiments I—III were all per- 
formed with children about 10 years 
old. Since the theory developed 
claims to have relevance to first 
language learning, it assumes that the 
phenomena under study also occur 
readily in much younger children. 
Experiment V was designed to ex- 
amine this assumption, An attempt 
was made to repeat Experiment I 
with preschool subjects making only 


336 Martin D. S. BRAINE 


such procedural changes as were 
necessitated by their illiteracy. 


Method 


Procedure. The following modifications 
were made to the procedure used in Experi- 
ment I with the older children. (a) Instead 
of nonsense syllables, animal noises served as 
the “words” of the language; “sentences” 
were thus sequences like MOO MEOW, OINK 
QUACK, etc. (b) Instead of being written 
each word was graphically represented by a 
picture of the animal or of its head, with the 
mouth open as if to talk. (c) The sentences 
were “read” from top to bottom rather than 
from left to right, i.e., a sentence was com- 
pleted by placing a word (picture) in the 
vacant position above or below the pre- 
sented word; this modification was intro- 
duced because it was thought that preschool 
children would more easily adopt a con- 
sistent top-to-bottom than left-to-right direc- 
tion of reading. (d) A correct response was 
rewarded with a raisin, and if the subject 
wished he could exchange four raisins for an 
M & M candy. (e) There were several ses- 
sions rather than one; these were kept short 
and second and subsequent sessions always 
began with one presentation of each of the 
first four problems; responses to these prob- 
lems were not tallied if the subject had met 
the learning criterion on these problems in 
prior sessions. 

Before the training began, a consensus, 
determined by the experimenter, was reached 
for each picture as to “what the animal said.” 
Then the first problem was presented several 
times over and used as a vehicle for teaching 
the children to read the pictures smoothly in 
the top-bottom sequence; on subsequent 
problems if the child had difficulty reading 
the sequence after his response, he was 
helped until he had read it at least once 
smoothly. 

In addition to the above, a further modifica- 
tion had to be introduced. On problems in 
which the word presented (e.g., Moo) was in 
top position, the child was instructed before 
his response: “Does it go Moo..., or 
Moo... ?”; during the first brief pause 
the experimenter indicated one alternative 
with the end of his pencil, and during the 
second the other alternative. When the word 
presented (e.g., QUACK) was in bottom posi- 
tion, the instruction was “Does it go... 
QUACK, or ... QUACK?” with similar in- 
dication of the alternatives. The purpose of 
the instruction was to encourage some sort of 
rehearsal of the alternative sentences. It 


was necessary because a picture appears to 
elicit vocal or subvocal rehearsal of a par- 
ticular word less readily than does the written 
word. The experimenter was careful not to 
give voice or gestural cues which could guide 
the subject to the correct response. 

Subjects. The subjects were 12 nursery- 
school children, aged 4-2 to 5-0. 


Results 


The learning scores are shown in 
Table 1. All subjects learned? and 
it can be seen that the median scores 
for both errors and trials to learn are 
very similar to those of the older sub- 
jects of Experiment I. Although the 
upper end of the range is higher, two 
subjects completed the initial learning 
without any errors whatever, just as 
in Experiment I. On 75% of the 
generalization problems the subjects 
filled the vacant position with the 
word that had occupied this position 
in the previous learning. This figure 
is close to the figure of 78% found in 
Experiment I. Vacant first and second 
positions were filled in about equally 
often. One subject gave a crude but 
accurate statement of the principle 
involved. 

While the procedure necessarily 
differs in too many ways from that of 
Experiment I to permit firm con- 
clusions to be drawn from the com- 
parison, it can nevertheless be stated 
that the preschool subjects did not 
find this task substantially more 
difficult than the older subjects found 
the analogous task in Experiment I. 


*The procedure proved not to be well 
adapted to some 4-year-olds, The initial 
teaching, the preliminary instruction to each 
problem, and the requirement of reading the 
picture sequences involved controls on the 
subjects’ behavior which tended to arouse 
their opposition. This was generally handled 
by keeping the sessions brief, but neverthe- 
less four subjects refused to cooperate after 
experiencing only one or two sessions. These 
subjects are omitted from the results (they 
were not learning more slowly than the others 
when they stopped). 
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Moreover, contextual generalization 
occurred in both age groups. While 
a comparison of age groups on a 
relevant task which involved neither 
reading nor simulation of reading in 
either group would be more satis- 
factory, the results nevertheless pro- 
vide prima facie evidence of age-group 
similarities in the learning processes 
involved. 


GENERAL DISCUSSION 


The remainder of the paper will 
consider how far a theory based on 
contextual generalization may provide 
a plausible account of the learning of 
the grammatical structure of natural 
languages, particularly English. As 
developed in the preceding experi- 
ments the theory consists of three 
proposals. (a) “What is learned” are 
the locations of expressions in utter- 
ances. (b) Units (i.e., expressions 
whose position is learned) can form a 
hierarchy in which longer units con- 
tain shorter units as parts, the location 
that is learned being the location of a 
unit within the next-larger containing 
unit, up to the sentence. (c) The 
learning is a case of perceptual learn- 
ing—a process of becoming familiar 
with the sounds of expressions in the 
positions in which they recur.‘ 


4 Certain cues not involved in the experi- 
ments could facilitate the learning of position 
in natural languages. In English, for example, 
various suffixes are associated with many 
nouns and verbs (e.g., -MENT, -ATION, -NESS, 
etc., and -ATE, -IZẸ, etc.). Also, many of the 
more frequent nouns (especially those without 
suffices) are correlated with “object-like” 
features in the external world, and many 
verbs with ‘“‘process-like” features. That 
quite young children know these correlates 
has+been demonstrated by Brown (1957). 
Where they covary with position, all such 
cues, although only probabilistic, should 
facilitate the learning of the locations of 
expressions. (In the experiments it is likely 
that learning and generalization would have 
been markedly assisted if the words belonging 
in a given position had been “tagged” in 
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With respect to Proposal (b), prob- 
ably the simplest hierarchial scheme 
is provided by the binary-fractiona- 
tion model. This model, in which 
position is defined through successive 
first-last dichotomies, is the one used 
in the design of Experiments II and 
III, and can serve as a basis for 
discussion where a specific model is 
required. 

One direct way in which the theory 
might be examined is by collecting 
data on the development of gram- 
matical structure. The first word 
combinations of three children 18-24 
months old studied by the writer have 
been shown to have a characteristic 
structure which was interpreted as 
indicating that what had been learned 
was that each of a small number of 
words belonged in a particular posi- 
tion in a word combination (Braine, 
1963). Occasional reference to the 
subsequent development of these 
children will be made in the dis- 
cussion. 

The discussion will be concerned 
with some obvious facts about the 
structure of natural languages, Eng- 
lish especially, which raise difficulties 
for the theory proposed. Each diffi- 
culty will be discussed in turn. Since 
it is hardly to be expected that the 
learning of all of the many kinds of 
grammatical regularities that exist 
will prove amenable to interpretation 
in terms of the above proposals, a 
purpose of the discussion will be to 
determine the range of phenomena 
to which the theory can hope to apply, 
and suggest ways of extending its 
scope. 


Contrasting Word Order 


Of the conceivable arrangements of 
any given set of morphemes, usually 
only a few are grammatically possible 


some way, e.g., by giving them all the same 
ending.) 


er 
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(e.g., THE MAN. is possible in English, 
whereas MAN THE. is not); the word- 
order is said by linguists to be 
“restricted”? to those that can occur 
(e.g., Harris, 1951, p. 184). When 
two (or more) arrangements are 
grammatically possible, sometimes the 
two orders are equivalent (e.g., BOYS 
AND GIRLS, GIRLS AND BOYS), in which 
case the order is said to be “free”; 
more often the two orders are non- 
equivalent (e.g., GEORGE HIT JOHN, 
JOHN HIT GEORGE, or THE CHILD Is 
INSIDE THE CAR, THE CAR IS INSIDE 
THE CHILD), in which case the orders 
are said to “contrast.” 

In mastering distinctions associated 
with contrasts between word orders, 
the child probably learns to respond 
to relationships quite different from 
any suggested so far in this paper. 
For example, in English both the 
sequences Noun + INSIDE, and IN- 
SIDE + Noun can occur; to use them 
appropriately the child must presum- 
ably learn to place INSIDE before the 
word for the container and after the 
word for the contained object. One 
of the children the writer has studied 
uttered both verb-noun and noun- 
verb sequences at about 24 months 
of age, but he had apparently not 
learned that the agent of an action 
goes before the verb, and the object 
of the action typically after it, so that 
where English word order contrasts, 
his word order was free. Thus, among 
Gregory’s utterances one finds CARRY 
Mommy (where Mommy is to do the 
carrying), COMES ELEVATOR, TRUCK 
FIX, FALL DOWN RABBIT, as well as the 
normal English word order, e.g., 
DADDY FIX, Mommy COMES, etc. Ap- 
parently the agent—action sequence 
is not necessarily primitive in the 
English sentence but can develop, at 
least in some children, as a polariza- 
tion of a sequence which is initially 
more or less random. In asimilar way 
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one finds Gregory saying INSIDE 
Docror Z--- (meaning that Dr. 
Z --- is inside his office), POCKET IN- 


SIDE, as well as the normal order, e.g., 
CANDY INSIDE, INSIDE POCKET. 

It seems clear that the theory 
developed is not relevant to the learn- 
ing of contrasts between word orders, 
and that its scope must be confined to 
the learning of restrictions on word 
order. This exclusion from the theory 
may be quite far-reaching: it seems 
probable that features of English 
word order which have to do with the 
difference between transitive and in- 
transitive verbs, and with the dis- 
tinction between the prepositional and 
adverbial use of words like INSIDE, ON, 
OFF, DOWN, etc. (e.g., THE CAR OUT- 
SIDE, OUTSIDE THE CAR, or THE LIGHT 
[is] ON, ON THE TABLE) may not 
develop until the appropriate con- 
trasts have been learned. 


Cues to the “Middle” of Utterances 


If, as claimed, children learn 
the sentence-positions of words and 
phrases, they must have some way of 
identifying where in the sentence the 
initial phrase ends and the next phrase 
begins. In the experiments described, 
the boundary was clearly defined by 
the procedure: the subjects filled in 
the entire first or last phrase. (In 
the one experiment where this was not 
true learning was impaired.) But 
what cues define borders between 
constituents for the 2-year-old learn- 
ing the natural language? 

One can only speculate about 
possible answers to this question. 
Two major types of cue offer them- 
selves for consideration. The first 
have to do with intonation. By “in- 
tonation” is meant the variety of 
phenomena referred to by such terms 
as “stress,” “pitch,” “juncture,” “off- 
glide,” ‘‘on-glide,” “contour,” “super- 
fix,” “intonation pattern,” etc. Just 
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how much information about utter- 
ance structure is given by intonation 
is currently a matter of controversy in 
the linguistic literature. According to 
Trager and Smith (1951, pp. 67-77), 
the segmentation of an utterance into 
parts, and of the parts into parts, is 
almost completely specified by fea- 
tures of intonation. If this is true, 
most of the necessary information 
about the location of the boundaries 
between positions is contained within 
an utterance as spoken. However, 
the extent of the correlation between 
intonation phenomena and grammat- 
ical structure has been seriously ques- 
tioned (e.g., Bolinger, 1957). More- 
over, Trager and Smith's analysis 
assumes the existence of 12 phonemes 
which have to do with stress, pitch, 
and juncture. Recent work indicating 
that pitch and not intensity is the 
main cue to stress (Bolinger, 1958) 
would appear to put this phonemic 
analysis in jeopardy. It is doubtful, 
however, that anyone would claim 
that intonational phenomena provide 
no information about the partition of 
utterances. 

That intonation does provide infor- 
mation about segmentation is shown 
by the following demonstration. A 
five-word nonsense sentence, OB OR- 
DEM KIVIL MERVO EENA was recorded 
several times over, spoken in two 
ways. In one case the sentence was 
spoken with definite primary stress on 
the first syllables of the second and 
last words; in the other case stresses 
were placed on the third and last 
words; care was taken that there 
should be no special pause between 
any pair of words. Five people on the 
institute staff listened to each form of 
the sentence and wrote down a five- 
word English sentence which seemed 
to them grammatically similar to the 
model. It was predicted that the 
sentence would be heard as composed 
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of two parts, with the point of division 
located immediately following the 
word carrying the first main stress. 
Despite the crudity of the stimuli, the 
prediction was consistently borne out. 
In half the sentences a subject- 
predicate division was placed at the 
predicted mid-point (e.g., THE LADY / 
GOES INTO TOWN, THE ALGEBRA 
CLÁSSROOM / SEATED TWENTY) ; in four 
sentences an initial prepositional 
phrase was separated off (e.g., For 
SWEETENING / SHE PREFERRED SUGAR, 
IN THIS ORDER / WORDS FLY); in 
the remaining sentence the midpoint 
divided the predicate-phrase (THEY 
ARE GOING / RIGHT AWAyY—with a 
pronoun subject in English the first 
main stress tends to shift forward). 
If the length of the juncture between 
words, as well as the location of the 
first main stress, were manipulated, 
it may well be that the subjects could 
be led to be more consistent in the 
type of division they hear. 

In addition to intonational phe- 
nomena, a role in defining the im- 
mediate constituents may well be 
played by the closed-class morphemes 
(i.e., articles, prepositions, auxiliary 
verbs, etc.—these classes are called 
“closed” because they contain only 
a small number of morphemes which 
change relatively slowly in the history 
of a language, whereas the “open” 
classes, i.e., nouns, verbs, and adjec- 
tives, contain enormous numbers of 
members and show a relatively fast 
turnover historically). Computer pro- 
grams which analyze grammatical 
structure rely completely on the 
closed-class morphemes (Klein & 
Simmons, 1961). How much use the 
child makes of them probably depends 
on his level of development, since 
some of the closed-class morphemes 
tend to develop relatively late. How- 
ever, even in the early stages of 
development when the child may not 
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differentiate them clearly, they may 
nevertheless provide boundary mark- 
ers for the utterance constituents. 
In ordinary conversation these mor- 
phemes are typically unstressed and 
poorly differentiated phonetically (cf. 
/wonagow/ for WANT TO GO, /apiysa- 
kukiy/ for A PIECE OF COOKIE, /ola/ 
for ALL THE, etc.); conceivably they 
may serve in the early stages to 
provide a noise-filled physical separa- 
tion between the stressed (and there- 
fore phonetically clearer) meaning- 
bearing open-class words in the 
utterances the child is exposed to. 
Some support for this speculation is 
provided by one of the children 
studied by the writer at 24-27 months 
of age. Steven used two elements like 
these morphemes: one, which will be 
written UH, usually took the phonetic 
form /a/; the other, which will be 
written DI consisted of /d/ followed by 
a front or central vowel. Although 
they were devoid of meaning, definite 
rules seemed to govern Steven’s use 
of these elements. Both could occur 
at the beginning or middle of almost 
any utterance; when one of them 
occurred at the middle of the utter- 
ance, it seemed to mark the bound- 
ary between immediate constituents. 
Thus, in the corpus of Steven's 
utterances at this time there is a 
sequence UP UH TOP (i.e., “up on 
top”), and another UP TOP um BETTY 
(i.e., “up on top of Betty”), where UH 
appears to have shifted forward to the 
phrase division. Similarly, there is 
BETHY DI SLEEPY BED (i.e., “Bethy’s 
asleep in bed”), and No un DADDY 
SLEEPY (i.e., ‘‘Daddy’s not asleep,” 
or, “I don’t want Daddy to sleep”), 
where there is again a shift to the 
phrase division. While Steven was 
the only one of the three children 
studied to develop elements of this 
kind, his apparent use of them as 
boundary markers may indicate how 


very young children tend to hear some 
closed-class morphemes, even if they 
do not incorporate them in their own 
speech in this way. 


Positional Regularities in English? 


Since colloquial English constitutes 
both the terminal point of the child’s 
own development and the verbal 
environment in which he develops, a 
theory which proposes that what is 
learned are the locations of words 
must suppose that rigid rules define 
what parts of speech can occur in 
what phrase- and sentence-positions 
in ordinary English grammar. 

In discussing the extent to which 
such a correlation exists, it is con- 
venient to follow recent work in lin- 
guistics which divides English gram- 
mar into two parts. According to 
Harris (1957) and Chomsky (1957), 
the grammar of a language can be 
hierarchized into an elementary part, 
called the “kernel” of the language, 
and a second part which consists of a 
set of transformational rules for de- 
riving complex sentences from simple 
ones. The kernel grammar contains 
the definitions of the main parts of 
speech and describes rules for con- 
structing simple declarative state- 
ments without complex noun or verb 
phrases. The transformational rules 
then carry these kernel sentences into 
other sentences, or into phrase or 
clause segments of sentences, which 
could not be derived in the kernel 
grammar. Thus from the kernel of 
English one could generate THE MAN 
IS BITING THE DOG. The transforma- 
tional rules would then show how to 
turn this into the passive, or into the 
negative, or into any of several 
questions (e.g, WHY Is THE MAN 
BITING THE DOG?), or into a relative 
clause (e.g.,. . . WHO IS BITING THE 
DOG . . .), or into a noun phrase (e.g. 
THE MAN'S BITING THE DOG, as subject 
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perhaps of IS CAUSE FOR SURPRISE), 
etc. 

Obviously no set of rules defining 
the positions of words and phrases in 
simple declarative sentences like THE 
LIGHT IS ON or GEORGE WALKED 
ACROSS THE STREET, will also fit the 
part-of-speech positions in complex 
sentences of transformational origin 
like SHE FOUND THEM HELPING THE 
MAN MAKE IT GO with its successive 
noun-verb sequences, or THE BOY'S 
PUSHING THE LAWNMOWER WOKE THE 
BABY where the arrangement of words 
in the initial noun phrase is more like 
that of a normal sentence than of an 
ordinary noun phrase. Correlations 
of parts of speech and sentence posi- 
tions must therefore be discussed 
separately for kernel sentences and 
for the derived transforms. Within 
the simple declarative kernel sentence 
there appears to be a standard ar- 
rangement of words and phrases which 
is normal for this type of sentence. 
(Whether this arrangement actually 
shows the detailed correlations of 
words and phrases with positions 
which is required by the theory pro- 
posed will be considered in the next 
section.) Similarly, in each type of 
transform there seem to be definite 
rules governing the positional arrange- 
ments of parts of speech which is 
standard for that type of transform. 
For example, all interrogatives are 
constructed according to a plan which 
is standard for interrogatives; simi- 
larly questions beginning with WHAT, 
WHERE, WHY, WHEN, HOW, WHO 
have a common arrangement ; relative 
clauses follow one of two main ar- 
rangements according to whether the 
pronoun is object or subject of its 
verb; transforms which occupy noun 
positions in other sentences follow 
one of several standard arrangements 
(e.g., THE BOY'S PUSHING THE LAWN- 
MOWER, THE LIGHT’S BEING ON, His 
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BRAKING THE CAR AT THAT MOMENT, 
etc., or THE READING OF BOOKS, THE 
ACTING OF PLAYS, etc., or FOR THE 
LIGHT TO BE ON, FOR HIM TO BRAKE 
THE CAR AT THAT MOMENT, etc., and 
several other forms) ; the same is true 
for the several other types of trans- 
form. In general, therefore, within 
the kernel and within each of the 
various types of transform there is a 
standard arrangement of parts of 
speech; the various standard arrange- 
ments are, however, all different from 
each other. 

One way to formulate this state of 
affairs would be to regard English not 
as a unitary language, but as a family 
of sublanguages. The sentences in 
some of the sublanguages are complete 
English sentences; in others they do 
not occur alone, being merely clauses 
or phrases of English. The sub- 
languages differ in that the sequential 
arrangement of morphemes in each 
sublanguage is peculiar to that sub- 
language. The languages have in 
common the fact that they all share 
the same vocabulary, and that any 
class of words which constitutes a part 
of speech in one sublanguage con- 
stitutes a part of speech in all the 
others. Moreover there are sentence- 
by-sentence correspondences between 
the sublanguages: to every sentence 
in each of the transformations there 
corresponds a sentence in the kernel 
(e.g., to JOHN WAS HIT BY GEORGE 
in the passive transformation, there 
corresponds GEORGE HIT JOHN in the 
kernel). However, the converse, that 
to every sentence in the kernel there 
corresponds a sentence in each trans- 
formation, is not true since many of 
the transformations are defective vis- 
a-vis the kernel (e.g., there are no 
passive transforms of kernel sentences 
containing the verb TO BE or in- 
transitive verbs). The kernel there- 
fore has a privileged status since the 
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sentences in each sublanguage ap- 
parently constitute a mapping of 
some large group of sentences of the 
kernel (cf. Harris, 1957, Footnote 61). 
It is worth noting that the changes in 
arrangement made in the various 
transformations are usually rather 
minor ones, and much of the normal 
word order of the kernel grammar 
tends to be retained (e.g., in the 
interrogative there is merely an in- 
version of subject and auxiliary verb; 
noun transforms like THE Boy’s 
PUSHING THE LAWNMOWER differ little 
in arrangement from the kernel sen- 
tences to which they correspond, i.e., 
THE BOY WAS PUSHING THE LAWN- 
MOWER, etc.). 

In attempting to account for the 
child’s learning of this intricate set 
of structures, it seems to the writer 
that it would be sound strategy to 
aim first at finding an explanation for 
the learning of the kernel of the lan- 
guage, i.e., for the learning of the 
structure of the simple declarative 
English sentence. This constitutes 
enough of a problem already. More- 
over, any proposed theory which fails 
to account for the learning of the 
kernel will, a fortiori, fail to account 
for the learning of the structure of the 
language as a whole. If a viable 
account of the learning of the kernel 
can be found, then the fact that all 
the sublanguages have so very much 
in common (the same vocabulary and 
parts of speech, and many of the same 
word arrangements) suggests that 
there may be some hope of treating 
the learning of the other sublanguages 
as a problem in transfer of training, 
In any case the remainder of this 
discussion will consider only whether 
the proposals advanced earlier can be 
worked into a defensible account of 
the learning of the kernel grammar. 

Even with the scope of the proposals 
narrowed in this way, there still 
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remains the problem that the verbal 
environment in which the child learns 
contains both kernel and transforms, 
and therefore does not consistently 
present to him the same parts of 
speech in the same positions. It is 
difficult to evaluate this problem. 
On the one hand, it is noticeable that 
adults tend to simplify their language 
when speaking to very young children ; 
also, since transforms usually change 
the normal word order in minor ways, 
it is probably true that most stretches 
of speech exemplify the normal posi- 
tional arrangements more than they 
distort them. On the other hand, in 
the early stages of development the 
positional relationships in the child’s 
own language obviously do not simply 
mirror the relationships in the adult 
English around him, but must be 
related to them in a much more com- 
plex manner. Almost nothing is 
known about the sequence in which 
various structures of English develop, 
so there is no point in discussing this 
question further. 


Contingencies between Positions 


Even as a theory of the learning of 
the kernel grammar, the proposals 
advanced are insufficient. The funda- 
mental stumbling block is that the 
parts of speech which can occur in one 
position are frequently contingent on 
what part of speech occurs in some 
other position. Another way of ex- 
pressing this difficulty is to say that 
the proposals advanced predict more 
generalization than actually occurs in 
natural languages. Thus, if four parts 
of speech, A, B, C, and D, occur in 
sequences AB and CD, the theory 
predicts that sequences AD and CB 
should also occur, by generalization. 
Yet such unlimited generalization of 
context often fails to occur in natural 
languages. 

The concept of a primary phrase. 
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The above difficulty arises in part 
because the proposals advanced make 
no provision that sentences be seg- 
mented so that generalization only 
occurs between comparable units. 
Consider the predicate phrases in such 
sentences as GEORGE IS THROWING, 
GEORGE IS READING, GEORGE IS 
THROWING ACCURATELY, GEORGE IS 
READING THE BOOK. According to the 
binary fractionation model, in the 
first two sentences, the first predicate 
position is occupied by 1s and the 
second position by THROWING or 
READING; in the other two sentences 
the first predicate position has Is 
THROWING or IS READING, and the 
second position ACCURATELY or THE 
Book. If the content of the first 
position in one case is recombined 
with that of the second position in the 
other case one obtains such predicate 
phrases as IS ACCURATELY, IS READING 
THROWING. As they stand, the pro- 
posals can be interpreted to predict 
that such recombinations would occur 
by generalization. To avoid such 
absurdities the theory clearly has 
to regard the predicate division 
of GEORGE IS THROWING into IS 
+ THROWING as somehow less im- 
portant than the predicate division of 
GEORGE IS READING THE BOOK into 
IS READING + THE BOOK. The sim- 
plest way to accomplish this is to set 
up a distinction between “primary 
phrases” and “components of phrases” 
(i.e., morphemes) : IS THROWING, THE 
BOOK, ACCURATELY are “primary 
phrases,” each having component 
morphemes (e.g., IS, THROW, -ING; 
ACCURATE, -Ly). The proposals ad- 
vanced must contain two levels, and 
must state specifically that it is the 
locations of expressions which are 
primary phrases, or whose parts are 
primary phrases, that are learned, and 
that within each type of primary 
phrase the locations of the component 


343 


morphemes are also learned. With 
the proposals thus modified, one 
could say that the first part of a 
sentence is a noun phrase, and that 
the second part is either a verb 
phrase, or has as its first component 
a verb phrase. A statement of this 
kind is not possible unless a primary 
phrase such as a verb can be treated 
as a single unit, regardless of whether 
it is THROWS, THROW, IS THROWING, 
HAS THROWN, HAS BEEN THROWING. 

Primary phrases typically consist of 
an open-class morpheme (e.g., noun, 
verb) together with one or more 
closed-class morphemes (e.g., article, 
auxiliary verb). Since the closed- 
class morphemes tend to be specific 
to the primary phrases to which they 
belong (e.g., articles to noun units, 
-LY to adverbial units), and tend to 
recur much more frequently than the 
open-class morphemes, they could 
serve as cues to “mark” or “tag” 
primary phrases. They would then 
facilitate the learning of the locations 
of primary phrases. 

Some support for the concept of a 
primary phrase unit is provided by 
Glanzer (1962), who has shown that 
in paired-associate learning, although 
associates are learned more readily to 
open- than closed-class words in 
isolation, the opposite is true when 
the words are embedded in a non- 
sense syllable context. For example, 
although oF is a more difficult asso- 
ciate than FOOD, TAH OF ZUM is easier 
than vic roop ses. Glanzer inter- 
prets the results to indicate that 
closed-class words are incomplete 
units when isolated from context. 

The notion of a primary phrase also 
receives some confirmation in the 
stress sequences in English, since the 
normal English intonation seems gen- 
erally to mark with some degree of 
strong stress the principal word of 
each primary unit in simple sen- 
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tences (e.g., OUR NEIGHBOUR IS READ- 
ING THE R{OT-ACT QUITE VOCÍFER- 
OUSLY TO HIS CHILDREN IN THE 
GARDEN). 

One new problem is created by the 
concept of a primary phrase: one now 
has to enquire what makes the phrase 
a unit. What, if the metaphor will be 
forgiven, provides the psychological 
cement which binds the morphemes 
of a primary phrase more closely to 
each other than to other components 
of an utterance? Discussion of this 
question is temporarily deferred. 

Contingencies still incompatible with 
the proposals. Even when utterances 
are treated as segmented into primary 
phrases in the above manner, there 
are still very many cases where the 
occupancy of one position is con- 
tingent on the occupancy of other 
positions. Thus, in many English 
dialects, a small class of verbs (e.g., 
IS, BECOME, SEEM) yield predicate 
phrases consisting of verb + adjective, 
whereas following other verbs one 
usually finds adverbs (eg., THE 
WIND BECAME VIOLENT, but THE 
WIND BLEW VIOLENTLY; THE OPPOSI- 
TION SEEMED SKILLFUL, but THE 
OPPOSITION OBJECTED SKILLFULLY). 
The widespread phenomenon of gram- 
matical agreement constitutes another 
kind of contingency between positions. 
Consider, for example, the case of the 
arbitrary gender distinction in French. 
Masculine and feminine articles, and 
endings elsewhere in the sentence, 
occur in the same positions as each 
other; according to the theory as it 
now stands, their contexts should 
generalize feminine articles and end- 
ings occurring in association with 
masculine nouns and vice versa. A 
French child constructed according to 
the theory should get thoroughly con- 
fused. Since French children ap- 
parently contrive not to be, there is 
clearly something lacking in the pro- 
posals advanced, Within primary 
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phrases a contingency of one mor- 
pheme on the presence of another in 
the same unit is particularly frequent. 
There is, for instance, the contingency 
between certain prepositions and 
verbs (e.g., TEAR UP, THINK OVER), or 
between certain words and noun- 
making particles (e.g., INVOLVE-MENT, 
but DEPEND-ENCE). Such contin- 
gencies become very elaborate in the 
varied declensions and conjugations 
of highly inflected languages. Thus, 
to write a computer program which 
would produce Latin verb forms, one 
would have to proceed more or less by 
setting up paired lists A and a, 
B and b, Cand c, . . . etc.; on list A 
would be the first conjugation stems, 
and on List a the first conjugation 
endings; Lists B and b, C and c, 
D and d would cover the second, third, 
and fourth conjugations, and Lists E 
and e, F and f, .. . Z and z would 
take care of the irregular forms. A 
long disjunctive instruction would be 
required stating that a verb consists 
of a pair of items either from A and a, 
or from B and b, or C and c, . . . or 
Z and z. According to the theory 
proposed, such an intricate system 
should be very difficult to learn: the 
stems and endings should both gen- 
eralize in the course of learning with 
the result that the whole complex 
system should collapse into simple 
structure—one set of stems and one 
set of endings; it would, according to 
the theory, be simpler that way. Yet 
apparently it was not. Itis, of course, 
possible that the Roman child did 
overgeneralize in this way at some 
point in his development, much as 
English speaking children occasionally 
form participles like sincep and 
BROKED; but such generalizations 
must clearly have been relatively 
transitory and easily corrected, other- 
wise the language itself would pre- 
sumably have been less stable. 

There are therefore two central 
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problems which cannot be handled by 
the theory as it now stands: the con- 
tingencies between the contents of 
different positions, and the question 
as to what causes the morphemes of a 
primary phrase to “go together” as a 
unit. 

Learning contingencies. Let us add 
to the theory the assumption that the 
subjects learning a language tend to 
form associations—similar to those 
studied in paired-associate experi- 
ments—between morphemes of the 
language, and let us see if the addition 
of this assumption can provide a 
solution to the above problems. The 
assumption itself is not an extravagant 
one. That English-speaking subjects 
do indeed form such associations is 
amply demonstrated by the work of 
Miller and Selfridge (1950) and 
others. 

The notion that learned paired 
associations between morphemes play 
a role in learning grammatical struc- 
ture can only be plausible under 
certain circumstances. If one is 
forced to assume that every member 
of one of the large open classes of 
morphemes becomes associated with 
every member of another such class, 
the sheer number of associations 
posited becomes astronomical, and the 
assumption therefore implausible. If, 
however, one can assume that the foci 
of associative bonds are typically 
members of the closed morpheme 
classes (i.e articles, prepositions, 
plural -s, pronouns, auxiliary verbs, 
verb endings -ING, -ED, etc., adverbial 
-Ly, noun and verb suffixes -MENT, 
-ENCE, -IZE, etc.), the assumed number 
of paired associates learned, though 
still large, is very considerably less 
large. If the validity of the law of 
exercise is assumed, i.e., that the 
strength of an associative link is 
proportional to the frequency of oc- 
currence of the pair, then, given the 
fact that the most frequently occur- 
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ring morphemes are members of the 
closed classes, it is inevitable that 
these morphemes should become the 
foci of associational links. From the 
law of exercise one would predict that 
the strongest associative bonds should 
form between closed-class morphemes, 
as, for instance, between prepositions 
and articles (¢.g., NEAR THE, OF A, IN 
SOME), or between auxiliaries and verb 
endings (e.g. IS —ING, HAS —ED, 
HAS —EN) ; links somewhat less strong 
should occur between closed- and 
open-class morphemes—thus an as- 
sociation of some strength might be 
expected to be learned between the 
articles and virtually all the nouns in 
an English speaker's vocaculary, and 
between the auxiliary and verb end- 
ings and the verb stems. Paired- 
associate links of negligible strength 
would be expected to form between 
pairs of open-class words, except for 
a few pairs—few in proportion to the 
total number of possible pairs— 
which recur together frequently for 
ecological rather than grammatical 
reasons (e.g.,  DRIVE-CAR, DRINK- 
COFFEE).® 

Most cases of contingencies between 
positions can be schematized more or 
less as follows. One of two (or more) 
words, or classes of words may occupy 
the first position in a construction. 
Let us call these A’ and A”. These 
are followed by a word class, P, which 
is in turn followed by elements x and 
y, the choice of x or y being contingent 
on whether A’ or A” occurs in first 


5 The recent finding (Glanzer, 1962) that 
closed-class words elicit a greater variety 
of responses than open-class words in free 
word association, appears to support these 
contentions. However, it is not certain that 
free association data are germane, since there 
may be factors of set operative in free 
association experiments, which lead subjects 
to prefer responses based on meaning (e.g., 
perhaps, BLACK-WHITE) to responses that 
reflect directly learned verbal contingencies 
(e.g., perhaps, BLACK-BOARD). 
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position. That is, if an A’ occurs then 
x occurs, if an A” then y, giving 
grammatical sequences A’Px, A”Py; 
the elements x and y thus “agree” 
with the earlier part of the utterance. 
It is possible, of course, for x and y 
to precede P, in which case the 
formulae would read A’xP, A”yP, 
with the covarying elements con- 
tiguous. In terms of the theory one 
must assume that in the course of 
learning contingencies of this nature a 
paired-associate link forms between x 
and every A’, and between y and 
every A”.. If A’ and A” are members 
of a closed word class, as is often the 
case (e.g., pronouns each individually 
linked with particular verb endings x, 
y, . . .), the assumption seems to the 
writer reasonably plausible since the 
number of associations assumed is 
quite limited, and the amount of 
practice enormous. Frequently A’ 
and A” are not completely different 
words but are distinguished only by 
accompanying closed-class elements. 
Thus A’ may take the form Af or fA 
and A” the form Ag or gA, where f 
and g are closed-class morphemes, 
either affixes or separate elements like 
articles or auxiliaries. The formulae 
then take the form AfPx, AgPy (or 
fAPx, gAPy). Gender in French, and 
the singular-plural distinction in Eng- 
lish and a large number of languages 
seem to be examples of this type.® 
The only. associative linkages which 
® It is here assumed that the absence of an 
ending which is frequently present can itself 
serve as the focus of an association. Struc- 
tural linguists frequently treat the absence 
of an element as itself an element; thus, in 
the same way that Boys is analyzed as Boy 
-+-s, many grammarians analyze Boy as 
Boy + ø, where ¢ is the “zero ending” asso- 
ciated with the singular. The notion that the 
absence of a frequently present cue can itself 
serve as a cue does not seem objectionable 
psychologically. The English concord can 
thus be schematized as NgVs or NsVø 
(N = noun, V = verb, ø = zero ending); 
this exemplifies the text formula. 
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need be assumed are between f and x, 
and g and y. 

The theory has now developed all 
the ideas necessary for understanding 


grammatical frames (e.g., in English 
a grammatical frame consists of an 
arrangement of closed-class mor- 


phemes and dashes, such as THE — 
—s— Ly, which completely deter- 
mines the parts of speech going in 
each vacant position). If a part of a 
language fulfills the formulae fAPx, 
gAPy, and if it is assumed that the 
dependencies of fA and gA units on 
first position and of Px and Py units 
on second position are learned in 
addition to the paired associates f-x, 
g-y, then it is clear that the frames 
f ——x, g — — y will determine the 
parts of speech occupying the blanks. 

From the assumption that children 
learn both positional regularities and 
paired associations between mor- 
phemes, it is possible to deduce the 
conditions under which overgeneral- 
ization would be expected to occur in 
the course of learning declensions and 
conjugations. Such errors would de- 
pend on the relative rates of learning 
positional regularities and associa- 
tions. If the positional regularities 
are learned more rapidly, children 
should pass through a stage where 
“errors” such as SINGED and BROKED 
are common. The relative paucity of 
inflexions in English gives the English- 
speaking child less scope for errors of 
this nature than children learning 
some other languages; it would be 
interesting, for example, to know the 
extent and kinds of overgeneraliza- 
tions that the elaborate case systems 
and conjugations of Russian or Finn- 
ish give rise to in the children of these 
countries. Guillaume (1927) provides 
some data on the development of the 
verb inflections in French. According 
to Guillaume, initially verbs tend not 
to be inflected: one form is used 
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regardless of context. For example, 
in the case of the verb TENIR, of which 
the three most used forms are TENIR, 
TENU, and TIENT (or TIENS—TIENS and 
TIENT are, of course, phonetically 
identical), only TIENT may be used, 
€.g., IL A TIENT instead of IL A TENU. 
At a somewhat later stage the verb is 
inflected but overgeneralizations are 
frequent—Guillaume cites the in- 
finitive TIENDRE (instead of TENIR), 
the participle ÉTEINDÉ (instead of 
ÉTEINT), the imperfects PRENDAIS, 
TIENDAIS (instead of PRENAIS, TENAIS). 
Of particular interest is the fact that 
alternative incorrect instances of the 
same form are uttered in quick suc- 
cession by the same child. Thus one 
child is reported as using both the 
participles Buvé and Buvu (instead of 
Bu, from BOIRE) indiscriminately at 
one period; similarly, both ouvrr and 
OUVERT, PRIS and PRENDU, are re- 
ported as occurring in the same or 
successive utterances. If positional 
regularities (e.g., learning to place 
-É, -I, or -U at the end of the verb 
stem) are learned more rapidly than 
paired-associates (e.g., associations of 
particular endings -É, -1, or -U with 
particular verb stems) are formed, 
then it would be predicted that such 
forms as BUVE and BUVU might both 
occur temporarily without a marked 
preference for one or the other being 
shown.” 


7 It would probably be wrong to view the 
learning of conjugations as simply an as- 
sociating of particular stems with particular 
endings. In the Indo-European languages 
conjugations tend to have a characteristic 
thematic stem vowel. Thus in French the 
stems of two conjugations end with a thematic 
-z or -z in many forms, and the “-rz” verbs 
have a terminal consonant (cf. the futures 
DONN-E-R-AI, PIN-I-R-AI, VEND-Ø-R-A1). In 
Latin thematic long vowels -A, -#, -7 were 
characteristic of three conjugations and in the 
remaining conjugation a consonantal (or 
short vowel) stem was thematic. It would 
follow from the general line of thought pre- 
sented that these thematic vowels (or their 
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If cases exist where associations are 
formed more rapidly than the posi- 
tional regularities are learned, a 
different kind of error would be ex- 
pected—one in which there is some 
inversion of the normal order of 
morphemes. Invented English ex- 
amples might be FALL-DOWNED, or 
HANG-UPING. 

The notion that associations form 
between pairs of morphemes may 
also provide an answer to the question 
raised earlier as to what makes the 
morphemes of a primary phrase go 
together as a unit. If the strength 
of an associational link between mor- 
phemes is a function of the proximity 
of the morphemes in the sentence as 
well as of the frequency of their joint 
occurrence, it would follow that in the 
great majority of utterances each 
open-class morpheme will be more 
strongly associated with the other 
morphemes of the primary phrase in 
which it occurs than with morphemes 
outside the phrase. 


Conclusion 


The preceding discussion indicates 
that the line of thought developed 
from the experiments described cer- 
tainly cannot provide a general theory 
of the learning of grammatical word 
order. However, the gross facts of 
English structure reviewed appear to 
be compatible with a modified version 
of the theory, restricted in scope. The 


absence) should permit a considerable econ- 
omy in the number of associations learned, 
since often it need only be assumed that 
associations form between the radical and the 
thematic vowel on the one hand, and be- 
tween the vowel and those endings specific 
to a conjugation on the other. In Latin the 
three long-vowel conjugations sometimes 
shared endings which were not possessed by 
the other conjugation, so that one might 
speculate that the length of the stem vowel 
itself may have been a focus for associations, 
permitting a further reduction in the total 
number of associations to be learned. 
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necessary restrictions in scope are the 
limitation of the theory to the learning 
of the kernel grammar (although 
further study may permit extension 
to transformations), and the exclusion 
of the learning of contrasts between 
word orders. 

As modified the theory proposes: 
(a) “What is learned” are the loca- 
tions of units, and associations be- 
tween pairs of morphemes. (b) The 
location learned is the location of a 
unit within the next-larger containing 
unit of a hierarchy of units. There are 
hierarchies at two levels: within sen- 
tences the units are primary phrases 
and“ “Sequences of primary phrases; 
within primary phrases the ultimate 
units are morphemes. (c) The learn- 
ing of locations is a case of perceptual 
learning—a process of becoming famil- 
iar with the sounds of units in the 
temporal positions in which they 
recur. 
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THEORETICAL NOTES 


PERSONALITY AND PERCEPTION IN THE RECOGNITION 


THRESHOLD PARADIGM! 


BERNHARD KEMPLER anp MORTON WIENER 
Clark University 


Neither a “preperceiver” explanation (e.g., “perceptual defense") nor 
a “response hierarchy” explanation alone appears to be adequate for 
clarifying the personality-perception relationship in the recognition 
threshold paradigm. An alternative theoretical model is suggested in 
which “recognition” is held to be a function of the S’s characteristic 
response to the supraliminal part-cues available to him on each exposure 
trial. The several theoretical formulations which include the part-cue 
notion are reviewed. The proposed reformulation requires that cues 
available to S be identified and systematically controlled, and that 
differences in response characteristics both within and between Ss be 
specified independently of the experimental recognition response. 


Possible experimental methodologies are suggested. 


Investigations of the relationship of 
personality to perception have been of 
considerable interest to personality theo- 
rists as well as those interested in per- 
ception. For almost two decades, a 
large number of the studies investigating 
the personality-perception relationship 
have used the recognition threshold 
paradigm. However, because of con- 
ceptual and methodological issues, the 
interpretation of the recognition thresh- 
old data for meaningful verbal stimuli 
continues to be a source of some contro- 
versy, with the personality-perception 
relationship in this paradigm being 
questioned. This very controversy has at 
times overshadowed the original focus of 
interest, i.e., the interaction of personality 
variables and perception in recognition 
threshold behavior. The time may now 


1 This paper is an elaboration of part of the 
general framework being investigated by a 
research project supported by USPHS Grant 
M-3860 of which the second listed author is 
Principal Investigator. This paper was com- 
pleted during the tenure of a Predoctoral 
Research Fellowship, MF-17,030 to the first 
listed author. The authors would like to 
thank Joachim F. Wohlwill for his critical 
reading of early drafts of this paper and for 
his valuable suggestions. 


be ripe for a re-examination of the’original 
concern. 

The major controversy has been 
whether the differences in recognition 
thresholds for particular classes of stimuli 
or for particular subjects requires the 
positing of special perceptual processes 
which are the loci of the personality con- 
tribution and are assumed to affect the 
final perception by regulating or select- 
ing what is to be admitted to awareness. 
Those investigators favoring a per- 
ceptual explanation of the data have 
posited constructs such as “perceptual 
defense,” (e.g., Bruner, 1957; Bruner & 
Postman, 1947; Cowen & Beier, 1954; 
McGuinness, 1949, 1950)?; “perception 


2 At least one of these investigators, (i.e., 
Postman, 1953) later stressed that such labels 
as “perceptual defense” may refer only to an 
“observed property of certain recognition 
thresholds”, rather than a “mediating” 
process. Other investigators (e.g., Shannon, 
1962) have proposed the use of recognition 
threshold procedures to investigate person- 
ality and perception, but state there is no 
necessity for specifying the mediating proc- 
esses underlying differences in speeds of 
recognition. Shannon holds that the only 
assumption required for the use of the thresh- 
old recognition method in personality inves- 
tigation is that the relative speed of recogni- 
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without awareness,” “subception,” (e.g., 
Lazarus & McCleary, 1951) “registra- 
tion,” (e.g., Klein, Spence, Holt, & 
Gourevitch, 1958; Smith & Hendriksson, 
1955). Although each of these constructs 
derive from somewhat differing theoreti- 
cal biases, two major subclasses can be 
delineated. Some formulations (e.g., 
Lazarus & McCleary, 1951) posit a spe- 
cial process (subception), assumed to be 
evoked by particular classes of stimuli or 
events (“threat” or “need”); other for- 
mulations (e.g., Bruner, 1957; Klein, et 
al., 1958; Werner, 1957) posit special per- 
ceptual processes (e.g., physiognomic 
perception, gating, registration) that 
operate for all stimuli and perceptual 
events. However, in all of these formu- 
lations two basic assumptions are in- 
cluded: (a) there are at least two rela- 
tively independent perceptual systems, a 
supraliminal process that operates within 
awareness, and a subliminal process, i.e., 
“gating,” “registration,” or “subcep- 
tion,” which operates outside awareness; 
(b) the latter subliminal process is more 
sensitive, i.e., makes discriminations the 
supraliminal process does not make. In 
each of these perceptual formulations it 
is assumed that the appropriate affective 
or evaluative reaction to the stimulus is 
made within the organism while the sub- 
ject cannot yet discriminate and report 
the stimulus. Implicitly, therefore, the 
meaning of the stimulus or its appropriate 
meaning sphere is apprehended prior to 
correct recognition. 

On the other side of the controversy 
are those investigators who question 
whether recognition threshold data can 
shed any light on the question of the 
personality-perception relationship, or 
that such a relationship need be posited 
from the data. These investigators 
(e.g., Freeman, 1954; Goldiamond & 
Hawkins, 1958; Goldstein, 1962; Howes 
& Solomon, 1950, 1951; Solomon & 


tion is one aspect of the overall adjustive and 
purposive behavior of the individual. Recog- 
nition threshold data may be meaningfully 
related to other behavior patterns of indi- 
viduals, and no postualtion of “defensive” 
perceptual operations or any other mediating 
processes need be made. 
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Postman, 1952) have shown a systematic 
relationship between particular response 
parameters (e.g., frequency of prior 
usage, recency, expectancy, sets) and 
threshold levels. The general conclusion 
of this group of investigators seems to be 
that a perceptual interpretation of the 
threshold data is unwarranted, and that 
all systematic differences in recognition 
threshold can be formulated as some func- 
tion of known or discoverable response 
parameters. Although some of these in- 
vestigators have suggested that the re- 
sponse characteristics may be related to 
personality (e.g., Solomon & Postman 
1952), it is not clear from the various ex- 
positions of the response explanations 
whether any or how much variance in 
threshold behavior can be attributed to 
stimulusinput. Occasionally the impres- 
sion is even given that response probabil- 
ities remain constant despite changes in 
stimulus information. For example, 
Goldstein (1962) states: ‘The results 
indicate that the subject does enter the 
perceptual situation with clearly defined 
response habits which are not under the 
control of the perceptual stimulus and 
which can influence the subject’s recogni- 
tion score [p. 27].” At any rate, the re- 
sponse position considers it unnecessary 
to posit a personality-perceptual rela- 
tionship to account for perceptual de- 
fense and related phenomena in the recog- 
nition paradigm. 

Both of the above alternative interpre- 
tations of the recognition threshold data 
seem to consider response characteristics 
and perceptual processes as mutually ex- 
clusive explanatory categories. While 
the “response” explanation has mini- 
mized the contribution of the available 
part-cues, the “seeing” explanation im- 
plicitly assumes “meaning” availability 
even on those trials where the subjects 
cannot (i.e., do not) identify the stimu- 
lus. This dichotomy apparently cor- 
responds to the statistical and methodo- 
logical distinction between seeing vari- 
ance and “saying” variance (Neisser, 
1954). Although this distinction may be 
a useful one in experimentation and for 
the handling of data, care must be taken 
that what begins as a means of partialling 
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out the variance of specific empirical re- 
sults (i.e., differences in threshold) does 
not end in obscuring the possible rele- 
vance of the data for the personality per- 
ception relationship. Analyzing thresh- 
old variances into seeing and saying 
components does not imply that these 
two sources are in any meaningful sense 
independent in perception. Further, any 
theoretical formulations of the relation- 
ship of personality and perception in 
recognition should be congruent with the 
explanations of other perceptual be- 
haviors, and need, therefore, include both 
the stimulus input and the response 
characteristics of the subjects. The 
problem, then, appears to be how to re- 
interpret and reformulate the recognition 
data for the personality-perception rela- 
tionship without, on the one hand, in- 
voking special and unusual processes for 
special instances, or, on the other hand, 
minimizing or ignoring the stimulus in- 
put contribution. 

Some investigators have taken a step 
in this direction by going beyond the 
somewhat facile distinction between see- 
ing and saying, andattempting a more de- 
tailed analysis of the variables which may 
be involved in the recognition situation. 
Eriksen (1956) and Eriksen and Browne 
(1956), for example, have pointed out 
that an incorrect report on a recognition 
trial may be a function of the limited 
number of verbal categories which are 
available to the subject to indicate what 
he “sees.” Bricker and Chapanis (1953) 
and Blackwell (1953) have shown that 
those recognition trials, on which the 
subject reports seeing “nothing,” or gives 
an incorrect report, may nevertheless 
provide the subject with some useful in- 
formation about the stimulus. In a 
similar vein Wiener (1957)* and Wiener 
and Schiller (1960) have argued that the 
postulation of subliminal processes to ac- 
count for differential recognition thresh- 
olds must be considered premature as 
long as the operation of all supraliminal 


3M. Wiener, research proposal entitled 
“Perceptual Thresholds: Conditions and 
Parameters,” 1957, National Institute of 
Mental Health, United States Public Health 
Service. 


351 


perceptual factors has not been ruled out. 
All of the above writers appear to dis- 
tinguish among the stimulus as given by 
the experimenter (e.g., a word or phrase), 
the available cue to the subject, i.e., that 
portion of the stimulus which the sub- 
ject perceives “‘supraliminally,” and the 
subject's verbal report irrespective of its 
correctness, i.e., match of the subject's 
report with the experimenter’s data sheet. 

As has been emphasized by several in- 
vestigators (Goldiamond, 1958; McCon- 
nell, Cutler, & McNeil, 1958; Wiener, 
1957; Wiener & Schiller, 1960) thresh- 
old is a statistical concept rather than an 
absolute point or value. Thus, on any 
given trial or exposure level (particularly 
in the word recognition paradigm) any 
subject may have (be “aware” of) a 
range of information about the stimulus, 
from A, nothing at all, to B, some part 
of the stimulus (part-cue perception), to 
C, a sufficient number of cues so that no 
other than a specific (i.e. the correct) 
response has a significant probability of 
being emitted. The consequences of the 
stimulus inputs for “recognition” should 
differ at these three ranges within the 
perceptual-cue continuum. In Instance 
A (i.e., no input), recognition threshold 
can only be some function of a subject's 
response characteristics and the probabil- 
ity of his pattern of responses matching 
the experimenter’s data sheet (e.g. 
Goldiamond & Hawkins, 1958). How- 
ever, even under this extreme condition, 
the explanation in terms of absolute re- 
sponse hierarchy based on frequencies of 
occurrence in the English language or 
absolute response characteristics of the 
subjects requires some modification. 
There is evidence that a subject’s pattern 
of responses will vary with situations, 
instructions, knowledge of correctness, 
etc. (e.g., Goldstein & Himmelfarb, 1962; 
Smock & Kanfer, 1961). In the instances 
of C (i.e., a great deal of information), 
the contribution of the subject’s response 
characteristics is minimal, since even a 
stimulus word with which the subject is 
unfamiliar would be expected to be cor- 
rectly identified and reported on these 
trials, if we assume some correspondence 
between seeing and saying. 
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On the most trials in the word recogni- 
tion paradigm, the subject is likely to 
have part of the information available, 
that is the B range of perceptual cue con- 
tinuum. On any one trial, the subject 
may be aware of some of the letters, the 
probable length of the word, straight- 
round letter, etc. Further, as is evi- 
dent from psychophysical research, it 
is unlikely that cues remain constant 
from trial to trial even if successive ex- 
posures are made at the same intensity 
and duration. When in addition the 
stimulus presentation conditions are 
changed by an increase in duration, in- 
tensity, or clarity (as is usually the case), 
it is almost certain that the informational 
cues have been modified. This modifica- 
tion (increase) of information, together 
with the information about the in- 
correctness of earlier trial responses 
(Bricker & Chapanis, 1953; Goldstein & 
Himmelfarb, 1962), and the subject’s re- 
sponse to the information about the in- 
correctness of earlier trials, should radi- 
cally influence the saying behavior of the 
subject. For example, if the word 
TAPIR is presented by the experimenter, 
and only the ra—is available to the 
subject on the first trial, the word 
TABLE may bea probable response. If on 
subsequent trials the subject has the in- 
formation TA—R, TABLE is no longer a 
probable response. Whether the prob- 
ability of tapır has or has not been 
significantly altered by the additional 
cue should depend on the individual 
subject’s response characteristics in the 
presence of the cue TA—R (e.g., whether 
the subject uses English, his verbal 
fluency, probability of using particular 
nouns, etc.). 

A theoretical position that attempts 
to account for differential recognition 
thresholds wholly in terms of response 
parameters implies that stimulus input 
variables (intensity, duration, number of 
cues) do not essentially affect differences 
in recognition threshold. Thus, an ex- 
perimental situation, in which a subject 
emits responses in the absence of stimuli 
(Goldiamond & Hawkins, 1958; Gold- 
stein, 1962) has been proposed as an 
adequate paradigm for all recognition 
threshold behavior. It seems important 


THEORETICAL NOTES 


in the connection to point out that the 
subject’s response on one trial, when one 
cue, e.g., TA—, is available is not neces- 
sairly continuous with his response on 
the next trial, when different cues may 
be available. The relative saying prob- 
abilities of TABLE and TAPIR do not change 
gradually but abruptly with changes in 
available stimulus information. ‘Thus, 
the addition of the single cue—R does not 
merely reduce the probability of TABLE; 
it makes this response highly improbable. 
Consequently, it is inappropriate to 
speak only of the relative probabilities of 
TABLE and TAPIR as if, irrespective of the 
available cue (i.e., seeing) conditions, 
these responses had fixed positions on a 
“pure” response hierarchy. To make 
this latter position tenable one would 
have to maintain that either (a) changes 
in stimulus information do not affect 
subjects’ saying probabilities, or (b) 
changes in stimulus information do not 
occur on prerecognition trials. It is 
difficult to see how either of these state- 
ments are tenable. Constant seeing over 
trials may, as has been shown above, 
occur at the extremes of the perceptual 
cue continuum, where either none or 
almost all of the cues specified by the 
experimenter are available to the subject. 
Under these conditions perceptual infor- 
mation changes little, if at all, from trial 
to trial, and seeing does not contrib- 
ute significantly to differential thresholds. 
However, at those exposure levels, where 
different cues are available on successive 
trials, what is “seen” must be considered 
an important parameter of recognition 
threshold. 

The part-cue response-characteristic 
model leads to the general reformulation 
that differences in recognition threshold 
for different words (or for different sub- 
jects for the same word) can be considered 
a function of differential response char- 
acteristics of a subject (or between sub- 
jects) to the specific seen part-cues. This 
formulation does not minimize informa- 
tion input as does the frequency of re- 
sponse explanation, nor does it assume 

supersensitive registration” of seeing 
under marginal stimulus conditions, aS 
does the “perceptual” explanation. Fur- 
ther, the part-cue response-characteristic 
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formulation explicitly rejects the possible 
implication which at times is unwittingly 
suggested by an extreme response ex- 
planation that there is a pure response 
hierarchy which remains constant over 
the whole perceptual cue continuum. 
For, to imply that responses made in the 
absence of an experimenter-defined stim- 
ulus constitute a general behavior ten- 
dency which would be manifested also in 
the presence of informational cues, is to 
posit an absolute response disposition, 
which seems as unwarranted as a pure 
perceptual explanation. 

Several formulations have appeared in 
the literature that have included the 
concepts of part-cue perception. While 
some of these formulations seem some- 
what similar to the proposed part-cue 
response-characteristics explanation, a 
careful examination of their underlying 
assumptions reveals several points of 
divergence. Three types of part-cue 
theories may be distinguished: condi- 
tioned perceptual avoidance, hypothesis 
theory, and response avoidance. 

Perceptual avoidance theory (e.g. 
Allport, 1955; Eriksen, 1954; Osgood, 
1957) holds that perceptual avoidance 
responses have been conditioned to frac- 
tional elements of stimuli. The main 
points of this position are evident in a 
quotation from Allport (1955): 


The term perceptual defense is misleading 
in that it suggests that defense is accomplished 
through perception (i.e., through the abortive 
character or self retardation of the perceptual 
process), an intepretation that raises the 
dilemma of a subconscious, pre-perceiving 
perceiver. If we could consider some frac- 
tional stimulus element in the situation, 
rather than the complete and meaningful 
stimulation pattern, can, in short exposures, 
lead the subject to avoid perceiving anything 
further with respect to that stimulus pattern 
(italics ours) until long exposures give him no 
escape from perceiving it, the matter could be 
more simply explained. It would not be a case 
of “perceptual defense,” in the sense of raising 
threshold, but simply of an inhibition of per- 
ceiving that has been conditioned to certain 
(actually perceived) cues [p. 333]. 


This formulation posits the delay of con- 
scious perception by the learning of 
avoidance of further perceptual responses 


conditioned to fractions of the “‘threaten- 
ing” stimulus. However, as Eriksen 
(1958) and Wiener and Schiller (1960) 
have also noted, the inference that there 
is learning of avoidance of further per- 
ceptual responsiveness as a function of 
the available part cues implies that the 
meaning of the stimulus (i.e., threat class) 
is somehow carried by the fractional ele- 
ments already perceived. It is only with 
this implicit assumption that it is pos- 
sible to account for differences in the 
recognition of words from different 
classes. However, as far as is known, in- 
formation about word meaning or class 
membership of words is not available in 
the fractional elements of words, but 
only in the apprehension of the whole 
word and its meaning (e.g., “whore” 
versus “‘where”). The assumption that 
there are differing structural character- 
istics of the partial cues of “‘neutral,” 
“need” or “threat” stimuli appears to be 
untenable. Therefore the systematic 
perceptual avoidance reaction (i.e., no 
further perceiving of certain classes of 
simuli) can only be posited with the as- 
sumption that the meaning of the stimu- 
lus has already been apprehended— 
which appears to be a special process per- 
ceptual explanation. 

A second type of formulation which àp- 
pears to include part-cue perception is 
“hypothesis” or “expectancy” theory 
(e.g., Postman, 1953). The distinguish- 
ing feature of this view is that the avail- 
ability of information about the stimulus 
is contingent on a predisposition to see 
particular cues. That is, the subject sees 
and “organizes” cues in accordance with 
a specific predisposition. Apparently it 
is possible under a particular hypothesis 
for cues to be seen in a modified form, so 
that they are perceptually experienced as 
being different from the objective charac- 
teristics of the stimulus. In this view, 
the subject's report matches the subject's 
perception but the perception itself may 
differ from the objective stimulus. The 
locus of the modification is in the registra- 
tion rather than in the response to what is 
registered. In contrast to this implica- 
tion in the “hypothesis view,” the con- 
cept of “distortion” of the stimulus is not 
included in the part-cue response-char- 
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acteristic formulation. Incorrectness of 
a recognition response is held to be rele- 
vant only to experimenter’s data sheet 
or some consensual criteria, and not to 
the subject’s perception. The cues avail- 
able to the subject are limited by (they 
can be no more than) the physical char- 
acteristics of the stimulus and the physio- 
logical condition of the sensory system. 
Insofar as personality variables are in- 
volved in perception, the locus is the 
subject's response elaboration, (limited 
by his previous experience) of the avail- 
able cues (letters, length of words, etc.), 
rather than in the selection or modification 
of the cues themselves. Since all of the 
information about the stimulus availa- 
bility parameters underlying the sub- 
ject’s response are unknown, it would ap- 
pear more reasonable to assume differ- 
ential responses to incomplete inputs 
rather than to assign distortion or modi- 
fication to the stimulus input itself. This 
seems particularly true since it is not 
evident how hypothesis theory attempts 
a systematic specification on the limits 
imposed by the objective characteristic 
of the stimulus or the range of possible 
modification of those cues. Further, 
while hypothesis theory appears to give 
priority to the dispositional state of the 
subject, (i.e., the hypothesis or set), the 
part-cue response-characteristic model 
assumes no priority for the stimulus, the 
stimulus viewing condition nor the re- 
sponse characteristics of the subjects. 
Under specifiable stimulus conditions or 
subject states, greater variance contri- 
bution may be attributable to any one of 
these parameters. 

A third formulation including part-cue 
is that proposed by Eriksen and Browne 
(1956). These writers advance a response 
avoidance as distinct from a perceptual 
avoidance model. Some responses from a 
pool of possible responses to a particular 
cue are not made because they have in 
the past been associated with anxiety. 
Thus, each cue evokes a hierarchy of po- 
tential responses, anxiety provoking re- 
sponses being the least likely to occur 
for all levels of cue availability up to the 
point of total stimulus input (i.e., range 
C of the perceptual cue continuum). 
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This view approximates most closely the 
one advanced here. The only point of di- 
vergence is that in the part-cue response- 
characteristic view it is not deemed neces- 
sary to include any assumptions about 
the ontogenesis of differences in response 
probabilities within or between the sub- 
jects. The processes involved in the 
establishment of response characteristics 
is a separate problem from when and how 
specific responses are evoked. Further, 
if anxiety underlies response avoidance 
as suggested by Eriksen and Brown, it is 
difficult to include differential response 
probabilities for words which are affec- 
tively neutral. 

In the light of the distinctions made 
among the various explanations of thresh- 
old behavior and its relevance for the 
personality-perception relationship, fur- 
ther elaboration of the part-cue response- 
characteristic formulations seems ap- 
propriate. First, the term ‘response 
characteristic” is meaningful in a con- 
text of specified stimulus conditions 
where it refers only to independently 
measured differences in a subject or be- 
tween subjects with specific and constant 
cue availabliity. Nothing more is im- 
plied than these specified differences and 
the correlates of the measured criteria. 
Further, this construct is not tautological 
since the differences within or between 
the subjects is specified independently of 
the experimental response; that is, the 
subjects are selected on some prior 
criteria. One additional consequence of 
this definition is that the term response 
characteristics has no dispositional status 
philosophically. It has the same status 
as the statement that a subject will 
change the pattern of his walking when he 
moves from level ground to a steep in- 
cline. The new walking response to the 
change in stimulus conditions is con- 
sidered no more dispositional than is any 
verbal response to the stimulus conditions 
in the recognition threshold paradigm. 
On the stimulus side, none of the pre- 
vious formulations have explicitly re- 
quired independent assessment of the 
available cue to the subjects. Most in- 
vestigators have assumed that the experi- 
mental stimulus is the available cue on all 
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trials, or have attempted to infer the cues 
from the correctness or incorrectness of 
the response. In contrast, the part-cue 
response-characteristic formulation re- 
quires independent assessment of the 
specific cues available to the subjects. 
Only with such specification (stimulus 
input and response characteristics) can 
anything be said of the relationship of 
the response characteristics (i.e., the 
personality parameters) and the stim- 
ulus input—the personality-perception 
relationship. 

Within this part-cue response-char- 
acteristic formulation, the fundamental 
concerns are to isolate and systematically 
vary the available cues under specifiable 
presentation conditions; to assess the 
differences in recognition (or other be- 
haviors, such as choice or associations) 
for independently specified subjects, or 
within the same subject under different 
cue and presentation conditions. 

At least two experimental methods are 
immediately suggested, both of which 
have been successfully employed. In one 
(Dowling, 1962) only part of the stimulus 
was made available (i.e. part of the 
word) on all exposure trials and differ- 
ences in threshold were assessed for the 
differing amounts of cues made available 
for the same subjects. In the second 
(Kempler, 1962) some portion of the 
whole stimulus was presented supralimi- 
nally while the remaining portion was 
exposed at quite low intensities to sub- 
jects who had different probabilities of 
responding to the different clearly pre- 
sented cues. 

It is only under conditions where the 
part-cues are known and identical for 
different subjects, that the difference in 
their recognition responses can be at- 
tributed to “personality” variables. Sim- 
ilarly, only with control of the available 
cues, can intraindividual differences in 
recognition behavior be considered per- 
sonality relevant. For example, it is 
possible to investigate the “threshold” 
for the word sex with the available cues 
s*x (the asterisk represents a smudge) or 
s** for different subjects, these subjects 
having been classified on some other cri- 
teria as “repressed” or ‘‘sensitized” to the 
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class “sex responsiveness.” Another pos- 
sibility is to investigate the same sub- 
ject’s responses to stimuli such as I FIGHT 
versus THEY FIGHT, where the I and THEY 
are clearly supraliminal and FIGHT in 
both instances is exposed at very low in- 
tensity. In this latter procedure, there 
is no apriori reason to expect that dif- 
ferential part-cues for the same word 
(i.e., FIGHT) would become differentially 
available to different subjects under the 
two stimulus conditions. Further, com- 
binations of these two suggested pro- 
cedures are possible, using sentences with 
selected “critical” portions not made 
available to the subjects. Such proce- 
dures make possible the systematic con- 
trols of stimulus input and response 
parameters and permit investigation of 
the personality-perception relationship. 

It is noteworthy that the suggested 
part-cue response-characteristics refor- 
mulation and the procedures for investi- 
gation are highly consistent with much 
of our everyday perceptual behavior, 
such as reading. Reading appears to in- 
volve the systematic scanning and identi- 
fication of part of the cues and the “elab- 
oration” of these cues based upon highly 
learned patterns of sequential occur- 
rences. It may be posited that all per- 
ception involves responding to partial 
information with the particular response 
being some function of previously learned 
co-occurrence probabilities. 

These learned co-occurrence probabili- 
ties may be considered a function of cul- 
ture, class, or individual variations in 
experience. Any differences, between 
and within individuals, however, ir- 
respective of their source, may be used 
for investigating the personality-per- 
ception relationship in this paradigm. 
What specific response parameters are to 
be investigated can only be specified by 
each particular personality theory. 
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NUMERICAL PREDICTION OF A SIMPLE FIGURAL AFTER- 
EFFECT AS A FUNCTION OF THE CONTRAST OF 
THE INSPECTION FIGURE * 


MAURICE M. TAYLOR 
Defence Research Medical Laboratories, Toronto 


Data recently published by Graham provide a new test of a theoretical 
equation for the amount of displacement in a simple figural after- 
effect, for the condition in which the acuity for the inspection figure 
is varied without alteration in the spatial relationships of the in- 


spection and test figures. 


An equation describing the amount of 
displacement of the test figure has been 
derived from a general psychophysical 
theory of figural aftereffects (Taylor, 
1926b). For the particular condition in 
which the inspection figure and the test 
figure each consist of a single dot, the 
equation has the form 


_ __hM/R 
= T+ (&M/d)? 


where E is the displacement of the test 
point immediately after the end of the 
inspection period, M the separation be- 
tween inspection and test points, d the 
inverse of the acuity for position of the 
test point, and R the ratio between the 
acuities for position of test and inspection 
points. The constants h and k are curve 
fitting parameters. Here and in the fol- 
lowing, the term “acuity” ideally is taken 
to mean the inverse of the threshold for 
difference in position. The “acuity for 
position of the test point” then refers 
to the minimum detectable change in 
position of a point with the same form 
and contrast, and at the same gross posi- 
tion as the actual test point. The greater 
the minimum detectable change in posi- 
tion, the lower the acuity. The measure 
of acuity used in applications of the 
equation has been the inverse of the 
minimum separable interval between two 
points of similar characteristics, which 
has been assumed to be proportional to 
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the minimum detectable change in posi- 
tion of a single point. The same practice 
is followed in the present paper. 

The equation has been used success- 
fully to describe the dependence of dis- 
placement on the separation between in- 
spection and test points for seven ex- 
periments in which the single point cri- 
terion was a reasonable approximation to 
the actual conditions. These experiments 
covered a variety of perceptual dimen- 
sions in audition (Krauskopf, 1954; Tay- 
lor, 1962a), vision (Culbert, 1954; Gib- 
son & Radner, 1937; Oyama, 1956), and 
kinesthesis (Charles & Duncan, 1959). 
The fitting parameters took the same 
values (h = 0.2, k = 0.03) for each of the 
auditory and visual experiments, but the 
value of h was increased to 0.55 to fit the 
kinesthetic data. 

The equation has been tested exten- 
sively with respect to changes in separa- 
tion between inspection and test points. 
The effect of independent change in acuity 
for the inspection point has not been con- 
sidered, although in most of the experi- 
ments whose results were predicted the 
acuity for the inspection point varied as a 
consequence of changes in its position. 
For example, in the auditory location dis- 
placement experiments (Krauskopf, 1954; 
Taylor, 1962a) the position threshold for 
the inspection point changed from about 1° 
when the separation was zero (inspection 
and test directly in front of the subject) 
to infinity when the separation was 90° 
(test point directly in front of the sub- 
ject, inspection point directly to his right 
or left) (Mills, 1958). This change in 
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acuity was considered in the theoretical 
prediction, as it was in most of the other 
predictions. No prediction was made, 
however, for the case in which the acuity 
for the inspection point was varied with- 
out changing the separation between in- 
spection and test points, since no experi- 
mental data were available to test such 
a prediction. 

Recently, Graham (1961) has pub- 
lished the results of an experiment in 
which the contrast of and hence the 
acuity for the inspection figure was 
varied, and in which the inspection and 
test figures could each be approximated 
by a single point. Other authors (e.g., 
Fujiwara & Obonai, 1953; Nozawa, 1953; 
Oyama, 1960; Yoshida, 1955, as cited 
by Sagara and Oyama, 1957) have pre- 
viously reported experiments in which 
the contrast of the inspection figure was 
varied, but in these studies, the stimulus 
patterns were too complex to permit di- 
rect prediction from the equation. 

The stimulus patterns and the pro- 
cedure used by Graham have been de- 
scribed in her earlier and more accessible 
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paper (Hammer, 1949). Both inspection 
and test figures were small rectangles, 
with a large circular disc as a fixation 
mark. For purposes of prediction, the 
fixation disc will be ignored, and each 
rectangle will be approximated by a 
single point at the midpoint of the side 
next to its neighbor. A different choice 
of position for the approximation point 
would make very little difference to the 
predicted displacement, which is very 
near maximum as a function of the sepa- 
ration between inspection and test figures. 


Prediction 


The inspection figure is approximated 
by a point, 67 minutes below, and 49 
minutes to the left of the fixation point, 
the test figure by a point 21.5 minutes to 
the right of the inspection point. The 
visual position thresholds for 100% con- 
trast at these locations will be taken as 
0.83 and 0.78 minutes respectively ( Fox- 
ell & Stevens, 1955; Linksz, 1952, Fig- 
ure 69). The acuity for the inspection 
point will be assumed to vary as the % 
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Fic. 1. The effect of contrast on displacement. (Experimental data from Graham, 1961. 
Theoretical curve not fitted to data.) ? 
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power of the percentage contrast (by ex- 
trapolation from data of Blackwell, 1946). 
The theoretical curve is not very sensi- 
tive to changes in this power. The great- 
est change in the predicted displacement 
of any part of the curve by assumption 
of any power from % to unity is only 
about 0.02 millimeters. The parameters 
h and k are given their usual values 
(h=0.2, k=0.03). Substitution of 
these values in the equation provides a 
theoretical prediction of the immediate 
displacement as a function of percentage 
contrast. 

Graham used the method of adjust- 
ment to measure her displacements. 
Ikeda and Obonai (1953) have pointed 
out that the displacement decays rapidly 
during the adjustment process, unless the 
preceding inspection period is very long. 
The measured displacement will therefore 
be less than the immediate displacement, 
and the theoretical prediction must take 
this factor into account. Taylor (1962b) 
has considered the process of the decay, 
and has shown that Graham’s earlier 
data (Hammer, 1949) are consistent with 
a decay time constant of 17 seconds for 
the 60-second inspection period used by 
Graham (1961). Graham reported that 
the subjects completed the adjustment in 
an average of 6 seconds. The measured 
displacement should therefore be approxi- 
mately 0.7 times the immediate displace- 
ment. 


Results 


Graham’s experimental results are 
shown in Figure 1, The zero contrast 
condition, rather than the condition of 
no pre-inspection, is taken as the control 
from which the displacements are meas- 
ured, to eliminate possible effects of in- 
spection of the large fixation disc. The 
theoretical displacements are 0.7 times 
the values obtained by substitution in the 
equation for immediate displacement, as 
discussed above. It should be emphasized 
strongly that Graham’s data do not enter 
at any point into the derivation of the 
theoretical curve, so that the curve can 
in no way be regarded as fitted to the 
experimental results. Considering the 
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reported variability in the experimental 
points, the prediction is good. Graham's 
results therefore provide an eighth test 
of the predictive power of the theoretical 
equation, as well as a new test of the 
independent effect of variation in the 
acuity for the inspection figure. 
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A STOCHASTIC MODEL FOR WORD ASSOCIATION TESTS 
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The paper presents a mathematical model to describe the distribution 
of the frequency of specific response words in the Kent-Rosanoff Word 
Association Test. The distribution function which is shown to fit the 
data is the Yule Distribution recently used by H. A. Simon to explain 


the distribution of word frequencies in texts. 


This distribution is 


characterized by a single parameter, a, which is the ratio of the number 
of different response words to the total number of response words. It 
therefore allows a simple comparison of the results obtained in different 
empirical situations by noting the variation in this 1 parameter. 


The generation of word responses to 
specific stimulus words, as in the Kent- 
Rosanoff Word Association Test, is 
generally described in terms of a hier- 
archy of responses with the most fre- 
quently given word characterized as the 
primary response and the remaining re- 
sponses arranged in decreasing hier- 
archical order. The rate at which the 
frequency of responses falls off with 
decreasing hierarchical order is usually 
described in the literature in rough 
qualitative terms as a “‘steep’”’ or a “flat” 
associative hierarchy. Since qualitative 
descriptions have only limited value in 
scientific studies, it would be highly 
desirable to find a quantitative expres- 
sion for the distribution of response fre- 
quencies in word association tests. A 
distribution function which fulfills this 
requirement was derived originally by 
G. U. Yule (1924) for the distribution of 
biological genera by numbers of species 
and more recently has been applied by 
H. A. Simon (1955) to the distribution 
of word frequencies in texts. 

Although the Yule distribution func- 
tion has not previously been suggested 
for this application, an earlier attempt 
to obtain a quantitative description of 
the frequency of word responses in the 
Kent-Rosanoff Test was made by B. F. 
Skinner in 1937. He computed the 
average frequency at each rank for the 
entire list and showed that the dis- 
tribution followed a Zipf-type of power 
law, the frequency decreasing as the 
inverse 1.29 power of the rank. This 
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demonstration did not lead to any sub- 
stantial application in word association 
studies, partly because the concept of 
rank is an awkward way of expressing a 
mathematical relationship (Skinner him- 
self pointed out that the discovery had 
little practical value because the fre- 
quency must be determined before 
ascertaining the rank) and partly because 
the result was an average for a large 
number of stimulus words and could not 
be fitted very well to the data for in- 
dividual words. It will be shown below 
that the stochastic model for the Yule 
distribution proposed by Simon over- 
comes these difficulties and can be 
applied to each individual stimulus word 
with buta change in one single parameter. 
Furthermore, Simon and others have 
pointed out that a Zipf-type of distribu- 
tion may be obtained as an approxima- 
tion to the Yule distribution function so 
that the application of a stochastic model 
may be regarded as formally identical to 
the Zipf approximation with the added 
advantage of a mathematical model 
which can be directly applied to all cases. 


Stochastic Model 


The stochastic model used by Simon 
to derive Yule’s distribution function is 
based on the following two assumptions: 

First, consider a text that has reached 
a length of k words. Then the proba- 
bility that the (k + 1) word is a word 
that has already appeared exactly i times 
is proportional to the total number of 
occurrences of all the words that have 
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appeared exactly $ times. This assump- 
tion is much weaker than if the assump- 
tion is made that the probability that a 
particular word occur next be propor- 
tional to the number of its previous 
occurrences since it leaves open the 
possibility that among all words that 
have appeared é times the probability of 
the recurrence of some may be much 
higher than of others. 

The second assumption states that 
there is a constant probability, a (in- 
dependent of &), that the (& + 1) word 
be a new word. 

From these two assumptions it is 
possible to derive the following distribu- 
tion function for the number of words 
used exactly i times (for details see 
Simon, 1955). 


J9 = ABG p +1). [1] 


f*(1), the number of words which appear 
only once, is given by 


F() = 


[2] 


where n, is the total number of different 
words in the sample of k words, and 


a=7. [3] 


B is the Beta function and p = 


l-a@ 
When & is small and pœ 1 then the 
Expression 1 simplifies into 


F nk 
HOTS [4] 
for large i this reduces to f(i) = m/# 
which is equivalent to the Zipf relation- 
ship that the product of frequency and 
rank is a constant. 

The procedure for fitting Distribution 
1 to word count data is to first estimate 
a from Expression 3, then calculate f*(1) 
from Expression 2. All subsequent terms 
may be obtained by applying the re- 
cursion Formula 5 for the Beta function. 


(1 -a)@— F 


It) ee ae 


7 f@-1). [B5] 
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Application of Mathematical 
Word Association Tests 


Model to 


Russell and Jenkins (1954) have pre- 
pared a compilation of responses to 100 
words from the Kent-Rosanoff Word 
Association Test. Their subjects were 
1,031 students in elementary psychology 
courses at the University of Minnesota. 
Each student was asked to give a single 
response to each of 100 stimulus words. 
A tabulation of all of the responses is 
available. The first stimulus word in 
their list is TABLE. The distribution of 
responses to this word is shown in 
Table 1. 

1,009 people responded to TABLE but 
only 47 different responses were obtained 


47 
thusa = 7,009 7 0466, f*(1) = 24.0, 


f*(2) = 7.9 and f*(3) = 3.9. The fit of 
the Yule distribution to the actual data 
is remarkably good considering the 
relatively small sample used. In this 
case since @ is very small, we could have 
estimated the values from the approxi- 


mate Expression 4. Thus f(1)= 3 =24 
oe 


and f(2) = 


well. 


= 8 fits the data equally 


aa by summing the 
series i )> f(i) we find that the number 
4 


of words occurring four or more times 
should be ;/4 or about 11. 

Since the distribution thins out very 
rapidly at large values of i, for a sample 


TABLE 1 
DISTRIBUTION OF RESPONSE 
To Worp TABLE 
Number of times Number 
Expected 
E acta pee from 
oe es : aon distribution 
|: ie Se 
1 28 24 
2 7 8 
3 3 4 
4 or more 9 11 
ete EAE O I ee ee EE., 


? = 1.41 


of this size we cannot expect to obtain a 
fit for individual points beyond $ = 4 so 
that all data for i> 4 are lumped to- 
gether. Thus the theoretical distribution 
cannot predict that, in an actual sample 
of 1,009 people, one particular word will 
be chosen 840 times as was the case in 
this sample but it does predict that since 
only 47 different responses are obtained, 
24 words should occur only once, 8 words 
twice, 4 words three times, and 11 words 
four or more times. We expect therefore 
that some words will occur very fre- 
quently but the distribution as a whole 
is best characterized by the single pa- 
rameter, a, rather than the frequency of 
occurrence of the most popular response. 


_ Further examples will make this clear. 


Asa second example let us consider the 
third stimulus word in the Russell- 


TABLE 2 


DISTRIBUTION OF RESPONSES 
to Worp MUSIC 


Number of times 
a particular 


response word 
occurs 
i 


Os AU Ee Whe 


9-15 
16 or more 


x? = 9.58, 


ee ee 


Jenkins list, Music. The distribution of 
responses to this word is shown in 
Table 2. 

Of 973 people 142 gave different re- 
sponses to the word Music. Thus @ 
= .146, Again the theoretical distribu- 
tion from Formula 1 gives a very good fit 
to the actual data and again there is no 
way of predicting that in this particular 
sample the most frequently used word oc- 
curs 183 times. However, this is bound 
to vary from sample to sample and the 
important point once more is not the 


Teoretica Nores 


TABLE 3 


DistaipuTion oF Resrowses 
To Worb BLOSSOM 


xt = 3.79, 


LT 


frequency of occurrence of the primary 
response word but the fact that the 
particular value of a = .146 describes the 
entire curve. The only difference, there- 
fore, between the steep curve of Table 1 
and the flat curve of Table 2 can be 
described in terms of this single pa- 
rameter since the same mathematical 
function was used to fit both curves. 
Two more examples will make this 
clear. There are actually two words in 
the Russell-Jenkins list, in which the 
primary word occurs with almost exactly 
the same frequency and yet the rest of 
the curve differs because the a's are 
different. These are the words BLOSSOM 


TABLE 4 
DISTRIBUTION OF RESPONSES 
to Word HIGH 
p 
Number of times Expected 
a particular values from 
response word Yule 
Soren distribution 
ee 
47 
15 
7 
4 
3 
2 
2 
1 
9 
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and HicH. The primary response to 
Bossom was given 672 times by 1,009 
people and the primary response to HIGH 
was given 671 times by 1,004 people. 
Yet Btossom elicited 63 different re- 
sponse words while HIGH received 90. 
Thus a for BLossom was .0625 while for 
HicH it was .0896. Again, as seen from 
Tables 3 and 4 the distribution function 
as a whole is well represented by the 
Equation 1, indicating that æ is a better 
measure of the association power of a 
word than is the frequency of the primary 
response. 


Implications of Model 


The chief implication of the model has 
already been brought out in the discus- 
sion, namely, that the frequency dis- 
tribution of words associated by a sample 
population as responses to a stimulus 
word can be described by a mathematical 
model which depends on a single pa- 
rameter, a, the ratio of the number of 
different words to the total number of 
words presented. Deeper implications 
do not seem to be involved, since, as 
Simon points out, the probability as- 
sumptions used for the derivation of the 
Yule distribution are relatively weak and 
of the same order of generality as those 
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employed in deriving the Poisson, Polya, 
and similar stochastic models which occur 
commonly in nature. Nevertheless, if 
any simple stochastic model can ade- 
quately describe the evolution of a dis- 
tribution function in a behavioral situa- 
tion, it will eventually lead to further 
insights into the social and psychological 
mechanisms at work. Thisis particularly 
true when one studies the variation in the 
parameters of the model from one em- 
pirical situation to another (e.g., see 
White, 1962). 
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An S-R matrix is one in which the rows represent stimuli, the columns 
represent responses, and the cell entries specify relationships between 
stimuli and responses. A generalization matrix is a square S-R matrix 
in which each column element is conditioned to a row element; the 
columns of a generalization matrix specify stimulus generalization, the 


rows specify response generalization. 


It is demonstrated that one 


cannot logically separate stimulus generalization and response general- 
ization in such a matrix. Hence, a decision about which type of 
generalization occurs is equivalent to a restriction of one’s attention to 


either rows or columns of the matrix. 


It is possible, however, to make 


a logical separation by analyzing a number of generalization matrices. 
These conclusions apply to all S-R matrices. 


The specification of stimulus-response 
(S-R) relationships has been the central 
concern of many psychologists; in fact, 
an entire school of psychology has held 
that such specification is the psycholo- 
gist’s only legitimate concern. 

Specification of S-R relationships may 
be viewed as specification of the entries 
of an S-R matrix—a matrix whose rows 
represent stimuli, whose columns repre- 
sent responses, and whose cell entries 
represent the relationships between stim- 
uli (rows) and responses (columns). 
These entries may be the probability 
with which a given response is evoked 
by a given stimulus, the magnitude of a 
given response to a given stimulus (e.g., 
GSR measurement), and so on. The 
S-R matrix may describe a single or- 
ganism at a single time, a single organism 
at a number of times, etc. 

This paper is a logical examination of 
S-R matrices; the purpose is to point out 
certain problems and assumptions inher- 
ent in the analysis of S-R data. It must 
be emphasized that these points will 
be of a purely logical nature and do not 
depend on any empirical verification. 

The examination will be conducted in 
terms of a particular type of S-R matrix 
—a generalization matrix. 


Consider a set (Sı, Sz... Sh) of 


1The opinions expressed are those of the 
author and do not necessarily reflect those of 
the Veterans Administration. 


stimuli and a set (Ri, Re. ~~. Ra) of 
responses such that for all ¢ R; has been 
conditioned to S;. Construct an S-R 
matrix from these sets and their inter- 
relations. Since generalization occurs 
whenever some R; occurs to some S; 
(j Æ i), the nondiagonal entries of our 
S-R matrix are indicants of generali- 
zation. 

But are we to consider these non- 
diagonal entries as indicants of stimulus 
generalization or as indicants of response 
generalization? Clearly, they are indi- 
cants of stimulus generalization—for a 
response conditioned to one stimulus 
(S;) has been evoked by another (Si); 
just as clearly, however, they are indi- 
cants of response generalization—for a 
stimulus to which one response (R;) has 
been conditioned has also evoked another 
response (Rj). (See Shepard, 1958, for 
definitions of stimulus and response 
generalization.) Hence: 

1. In a single S-R generalization 
matrix, stimulus generalization and re- 
sponse generalization are logically in- 
separable. 

A closely allied point follows: 

1’. Any theory or set of observations 
specifying the magnitude of one type 
of generalization automatically specifies 
the magnitude of the other. 

The columns of our S-R matrix specify 
stimulus generalization, for their entries 
are the probabilities with which (or 
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magnitudes with which) a response con- 
ditioned to one stimulus is evoked by 
other stimuli; similarly, the rows specify 
response generalization. Conclusion 1’ 
may be seen as following from the (some- 
what trivial) fact that specifying either 
row entries or column entries involves 
automatically specifying all the entries 
in a matrix. Our theory or observation 
enables us to fill in the rows (columns), 
and then we “read off” the columns 
(rows). 

Although stimulusand response general- 
ization are inseparable as logical proc- 
esses, we may wish to distinguish between 
them as psychological processes. Thus, 
we make a substantive decision about 
which psychological process is responsible 
for the S-R relationships. 

2. Such a substantive decision is 
equivalent to restricting one’s observa- 
tions and theories to rows or columns 
of the generalization matrix. 

In actual practice, we often make our 
substantive decision prior to experimen- 
tation and hence restrict our data gather- 
ing (therefore our observations) to a 
single row or column. 

To illustrate Conclusions Tol’, and) 2; 
let us examine an S-R generalization 
matrix recently reported in the literature. 
DeSoto and Bosley (1962) conducted a 
paired associates learning experiment in 
which men’s names were the stimuli 
and the class labels “freshman,” “sopho- 
more,” “junior,” and “senior” were 
responses. The number of errors during 
learning (in which a name associated 
with one class was said to belong to 
another class) was taken as a measure 
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of response generalization. The mean 
number of errors is presented in Table 1 
(from DeSoto & Bosley, 1962, p- 303). 
(Unfortunately, we cannot convert these 
errors to probabilities—since the authors 
do not supply the diagonal entries.) 

DeSoto and Bosley (1962) consider 
the matrix relationships to be a function 
of response generalization and thereby 
restrict their observations to rows. 
They write: 


The rows of the matrix show what might be 
called response generalization gradients . . . 
This systematic patterning of errors [in the 
rows], evidence in itself of a cognitive struc- 
ture, encourages an attempt to use frequencies 
of errors to determine distances in the struc- 
ture [p. 304]. 


Given Table 1, the authors have no 
logical method of separating stimulus 
and response generalization. Nor is 
there a logical reason for considering 
the matrix relationships to be functions 
of response generalization and hence 
to observe the systematic patterning 
of errors in rows, rather than in columns, 
to determine distance. (Happily, in 
this particular example, we obtain the 
same dimension and distances whether 
we analyze “row I scales” or “column 
1 scales’’; for such analysis, see Coombs, 
1963). 

When the decision to attend to rows 
or columns of a generalization matrix is 
not made prior to experimentation, it is 
usually based on the observation of 
“relatedness” (or unrelatedness) among 
one of the marginal sets (that is, among 
either stimuli or responses)—or it is 


TABLE 1 
Mean Number or Errors 
Label given by subject 
Correct label Sa a Se eee 
Freshman Sophomore Junior Senior Sum 
{i ee 
Freshman — 3.48 2.74 1.72 7.94 
Sophomore 2.36 — 4.77 3.29 10.42 
Junior 2.23 3.54 — 3.29 9.06 
Senior 1.78 3.00 4.08 — 8.86 
Sum 6.37 10.02 11.59 8.30 E 
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based upon a model. Shepard (1963), 
for example, has used both criteria. In 
a recent article on generalization, he 
writes, 


If the responses are sufficiently distinctive 
that they are not themselves confused, then 
the overt errors observed during identifica- 
tion or classification learning alike should be 
attributable solely to the pair-wise confusions 
between the stimuli [p. 95]. 


In an earlier paper on generalization, 
Shepard (1957) had developed a model 
for distinguishing between pair-wise con- 
fusions of stimuli and pair-wise confu- 
sions of responses—a model based on 
viewing an S-R matrix as a product of an 
S-S matrix and an R-R matrix. (While 
we shall not attempt to explicate this 
model here, it is important to what 
follows to note that it is based on making 
various pairings of stimuli and responses.) 

A generalization matrix is, of course, 
a specific type of S-R matrix—a square 
one in which the diagonal entries have 
a certain psychological meaning. The 
foregoing analysis is applicable, mutatis 
mutandis, to all other types of S-R 
matrices; among the more obvious, are 
included stimulus concept formation 
matrices (fewer columns than rows) and 
response concept formation matrices 
(fewer rows than columns).? Much 
psychological investigation may be de- 
scribed by S-R matrices.* 


2S-R matrices are, in turn, instances of a 
more general class of matrices that Coombs 
(1962) terms conditional proximity matrices 
of elements from distinct sets. The rows of such 
a matrix are elements of one set, the columns 
elements of another set, and the cell entries 
represent interset relationships. This class 
of matrices includes, in addition to S-R 
matrices, matrices specifying subjects’ scores 
on tests, matrices specifying people’s prefer- 
ences for objects, matrices specifying attitude 
items’ placement in categories, and so on. 
The analysis of S-R matrices presented here 
rests on the fact that the cell entries in this 
class of matrices may be viewed as functions 
of either set of marginals; hence, this analysis 
is applicable to all matrices in this class. 

3 In fact, a behaviorist could maintain that 
the types of psychological investigations 
possible are determined by the types of S-R 
matrices one can construct. 
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An S-R matrix of current interest is 
that describing perceptual defense. The 
rows of such a matrix are tachistoscopic 
inputs, the columns are perceptual indi- 
cators (Goldiamond, 1958), and the 
cell entries are the probabilities with 
which a particular input evokes a par- 
ticular indicator. Specification of the 
rows of this matrix is specification of 
response bias, while specification of the 
columns is specification of perceptual 
defense. From Conclusion 1, we see 
that we cannot separate perceptual de- 
fense from response bias by analyzing 
data capable of being represented in a 
single S-R matrix. From Conclusion 1’, 
we see that a complete specification of 
either the process of defense or the proc- 
ess of bias necessarily involves a com- 
plete specification of the other process. 
From Conclusion 2, we note that the 
controversy over whether input or re- 
sponse is responsible for experimental 
findings is a controversy over whether 
to attend to rows or columns of the 
(potential) data matrix. The early work 
on perceptual defense attended to col- 
umns, on the assumption the stimuli 
were responsible; the later work at- 
tended to rows, due to the discovery of 
relatedness among response patterns and 
substitutions. In a most recent attempt 
to separate row and column effects, 
Zajonc (1962) has used various pairings 
of threatening and nonthreatening stim- 
uli and responses. His technique is 
based on the same basic logic as is 
Shepard’s (1957) model. 

But analyzing data consisting of 
various S-R pairings is equivalent to 
analyzing more than one S-R matrix. 
Hence, Shepard’s and Zajonc’s tech- 
niques—which enable us to make a 
logical separation of the effects of stimuli 
and the effects of responses—illustrate 
the following principle. 

3. We may logically separate stimulus 
effects from response effects by analyzing 
S-R matrices with the same sets of 
stimuli and responses but with different 
pairings. 

The method of separation will vary in 
varying contexts. 

It is not, of course, always possible 
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to construct S-R matrices with the same 
set of stimuli and responses but with 
different pairings. Many areas of in- 

- vestigation involve natural (or unavoid- 
able) pairings. When it is possible, 
however, it would appear to be a more 
desirable method of separating stimulus 
and response effects than is the method 
of restricting one’s attention to rows or 
columns of the S-R matrix. 
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The punishment procedure is one in which an aversive stimulus is 
contingent upon the occurrence of a response. Various theories of 
the mechanism through which punishment exerts its influence on 
behavior emphasize the unconditioned fear response, the unconditioned 
skeletal response, the escape response, the similarity between the 
conditions of punishment and the conditions of training, the correla- 
tion of response and punishment, and the possible sources of reinforce- 
ment for nonresponse. The major problem of this paper was to 
determine whether any of the proposed mechanisms, or a combination 
of them, are sufficient to account for the varied effects of punish- 
ment on behavior. A systematic examination of the data led to the 
conclusions that if an aversive stimulus is contingent upon a response 
there will be greater suppression (or less facilitation) of the response 
than if the aversive stimulus is not contingent upon the response, 
but that the aversive stimulus, itself, may result in response facilita- 
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tion under some conditions and response suppression in others. 


In 1913, Thorndike presented the 
view that both reward and punishment 
had simple and clearly predictable ef- 
fects. He wrote, “When a modifiable 
connection between a situation and a 
response is made and is accompanied 
or followed by a satisfying state of af- 
fairs, that connection’s strength is in- 
creased: When made and accompanied 
or followed by an annoying state of 
affairs, its strength is decreased 
[Thorndike, 1913, p. 4].” With re- 
spect to reward, his position remained 
essentially unchanged in his later writ- 
ings and it is the dominant position 
today. With respect to punishment, 
however, Thorndike (1932) was con- 
fronted with numerous instances in 
which punishment did not weaken the 
strength of a response. Thus he re- 
ported, “Rewarding a connection al- 


ways strengthened it substantially; 
punishing it weakened it little or not 
at all [p. 58].” Considerable uncer- 
tainty remains today regarding the ef- 
fect of punishment and there does not 
appear to be any single reliable effect. 
Much experimental evidence indicates 
that punishment decreases the probabil- 
ity of occurrence of a response or in- 
creases its latency, but there is also 
much conflicting evidence. In some 
experiments punishment has only a 
temporary suppressing effect on a re- 
sponse, or none at all, and in other 
experiments punishment actually has- 
the paradoxical effect of increasing the 
strength of the response it follows. 
The purpose of this paper is to de- 
scribe the conditions under which the 
various effects of punishment are ob- 
served with the hope that a systematic 
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organization of the data may lead to 
increased theoretical understanding of 
the phenomenon. 


DEFINITION OF A PUNISHMENT 


It is far less difficult to define the 
punishment procedure than it is to de- 
fine a punishment. The punishment 
procedure is one in which a noxious 
stimulus is contingent upon the occur- 
rence of a response, but the definition 
of the key concept, “noxious stimulus,” 
presents serious problems. (In this 
paper the terms “noxious stimulus” 
and “aversive stimulus” will be used 
interchangeably, and such a stimulus 
will be called a “punishment” if it is 
contingent upon a response.) In most 
experiments on the effects of punish- 
ment, the subject is administered an 
electric shock of some intensity and of 
brief duration immediately following 
or accompanying a specified response. 
At low intensities it is meaningful to 
ask whether or not the electric shock 
was aversive. 

Mowrer (1947) defined a punish- 
ment as “a relatively sudden and pain- 
ful increase of stimulation following 
the performance of some act [p. 136],” 
but neither the specification of the 
aversive stimulus in physical nor in 
subjective terms has led to precision. 
Most definitions of a noxious stimulus 
involve some reference to behavior, un- 
conditioned or conditioned. The aver- 
sive quality of a stimulus can be de- 
fined and scaled in terms of the effect 
of its presentation on certain uncondi- 
tioned autonomic or skeletal responses. 
The former would be particularly rele- 
vant if “fear” were critical to the 
punishment procedure; the latter would 
be particularly relevant if “competing 
responses” were critical to the punish- 
ment procedure. 

In the case of definitions of a nox- 
ious stimulus in terms of response- 
contingent procedures, logically, there 
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are four alternatives. The effect of a 
response can be to remove a stimulus 
or prolong its absence, or it can be to 
produce a stimulus or prolong its pres- 
ence. In the case of noxious stimuli, 
the first three procedures are called 
escape, avoidance, and punishment, re- 
spectively. The fourth procedure has 
no commonly accepted name, so it will 
be called the “preservation procedure.” 
(a), The escape procedure is one in 
which the noxious stimulus is present 
and the response terminates it, (b) 
the avoidance procedure is one in which 
the noxious stimulus is absent and the 
response prolongs its absence, (c) the 
punishment procedure is one in which 
the noxious stimulus is absent and the 
response produces it, and (d) the 
preservation procedure is one in which 
the noxious stimulus is present and 
the response prolongs its presence. The 
noxious stimulus can be defined in 
terms of any of these procedures. 
Thorndike (1913) defined the nox- 
ious stimulus in terms of both the es- 
cape and the preservation operations. , 
He described the punishment pro- 
cedure as one in which a modifiable 
connection between a situation and a 
response is accompanied or followed 
by an annoying state of affairs, and 
an annoying state of affairs as “one 
which the animal does nothing to pre- 
serve, often doing things which put an 
end to it [p. 2].” Several recent the- 
oretical treatments of punishment have 
employed the escape operation as the 
basis for the definition of a noxious 
stimulus (Dinsmoor, 1954; Skinner, 
1953). Why have not psychologists 
chosen to say a stimulus is a punish- 
ment if it suppresses behavior? There 
are no logical grounds for defining a 
noxious stimulus in terms of the es- 
cape operation rather than in terms of 
the punishment operation, but it ap- 
pears at the present time that the ef- 
fects of the escape operation are far 
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more reliable than those of the punish- 
ment operation. Because of the ap- 
parently varied effects of punishment, 
some indirect definition of the noxious 
stimulus used in the punishment situ- 
ation is currently favored by most psy- 
chologists in their theoretical remarks. 

In practice, however, very few ex- 
perimenters have taken seriously the 
empirical definition of noxious stimulus 
in terms of the escape operation, and 
thus they have not actually determined 
whether or not their punishing stimulus 
would really lead to escape learning. 
In some cases the punishing stimulus 
is obviously sufficient to produce es- 
cape learning. For example, in one 
experiment a rat could terminate a 
punishment only by performing a spe- 
cific escape response (Kamin, 1959, 
Experiment I). In many cases it is 
not clear whether or not the punishing 
stimulus would have produced escape 
learning, whereas in other cases the 
punishing stimulus was selected to be 
nonescapable, e.g., the punishment of 
less than 100-millisecond duration 
used on pigeons by Azrin (1960). It 
is doubtful that a pigeon could learn 
to escape from shocks of such brief 
duration although, by other definitions, 
the stimuli could be considered nox- 
ious. 

The use of the single concept of 
noxious stimulus to embrace the pro- 
cedures of aversive classical condition- 
ing, escape training, avoidance train- 
ing, punishment, and preservation may 
be a costly parsimony. In a particular 
situation there will be a measurable 
threshold of intensity of the punish- 
ment necessary to obtain some re- 
sponse suppression. Is this also the 
threshold of fear? Is it the weakest 
aversive stimulus that will elicit com- 
peting responses? Is it the threshold 
for escape or for avoidance? Further 
empirical work must be done to es- 
tablish the relationship between the ef- 
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fect of variations in the physical dimen- 
sions of the aversive stimulus in the 
punishment progedure on some meas- 
ure of punished behavior and the effect 
of similar variations of the aversive 
stimulus on behavior in other negative 
reinforcement procedures, This could 
lead to important statements regard- 
ing the conditions under which a stim- 
ulus will serve as an effective punish- 
ment, Of course, the effectiveness of 
of a punishment may depend on numer- 
ous factors other than the severity of 
the stimulus, eg., the effortfulness of 
the response, the amount and kind of 
previous training, the drive level, the 
probability that the noxious stimulus 
and the positive reinforcement will fol- 
low the response. Similarly, there are 
many factors other than the severity 
of the stimulus that determine the ef- 
fectiveness of the procedures of aver- 
sive classical conditioning, preserva- 
tion, escape training, and avoidance 
training. For these reasons, no indi- 
rect definition of a noxious stimulus 
can be made with confidence. There- 
fore, in this paper, the punishment 
procedure will refer to response-con- 
tingent presentation of stimuli that 
vary tremendously in severity. 

We have found it useful to dis- 
tinguish between two types of training 
conditions and two types of extinction 
conditions as follows: (a) regular- 
training (or, training) refers to a 
procedure in which positive reinforce- 
ment is contingent upon a response; 
(b) punishment-training refers to a 
procedure in which both positive re- 
inforcement and an aversive stimulus. 
are contingent upon a response; (c) 
punishment-extinction refers to a pro- 
cedure in which an aversive stimulus 
is contingent upon a response; and (d) 
regular-extinction (or, extinction) re- 
fers to a procedure in which neither 
positive reinforcement nor an aversive 
stimulus is contingent upon a response. 
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Finally, we have found it useful to 
differentiate between training based on 
positive reinforcement (positive instru- 
mental responses) and that based on 
negative reinforcement (escape and 
avoidance responses). 


THEORIES oF PUNISHMENT 

Before making a detailed consider- 
ation of the effects of punishment on 
behavior, it may be useful to consider 
the possible mechanisms through which 
this procedure may be effective. Con- 
sider a rat that has learned to press a 
lever for food reinforcement when an 
auditory stimulus occurs. On the first 
trial of punishment-training, the audi- 
tory stimulus came on, the rat moved 
toward the lever and pressed it, and 
an electric shock began. The rat 
squeaked and jumped back, the shock 
and the auditory stimulus terminated, 
and the rat ate the food. Later in 
punishment-training, the auditory stim- 
ulus came on, the rat moved toward 
the lever but it did not press it. Why 
did the rat change its behavior after 
the introduction of the punishment? 

On the first trial of punishment- 
training the following events occurred : 
There was a discriminative stimulus 
under the control of the experimenter 
and response-produced stimuli under 
the control of the subject. These were 
followed by a lever response that was 
followed by punishment. The onset 
of the punishment was followed by 
emotional and skeletal responses that 
were followed by the termination of 
the punishment. Which of these 
events were necessary to produce the 
observed change in behavior ? 


Theories Not Necessarily Involving the 
Correlation of the Response and Pun- 
ishment 


In the punishment procedure the 
response always intervenes between 
the discriminative stimulus and the 
punishment, but is this response of any 
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consequence? Empirically, the prob- 
lem is to compare the performance of 
an experimental group under the pun- 
ishment procedure and that of a con- 
trol group receiving the same sequence 
of discriminative stimuli and punish- 
ments, but uncorrelated with their re- 
sponses. If the two groups are simi- 
lar in their performance, then some 
theory not involving the correlation of 
response and punishment will be re- 
quired to account for the effects of 
punishment. Four suggestions have 
been offered to explain the effects of 
punishment that do not require a cor- 
relation of the response and punish- 
ment. All four mechanisms can ac- 
count for either response facilitation or 
response suppression under conditions 
of punishment. The fear hypothesis 
emphasizes the emotional responses 
elicited by the punishment; the com- 
peting response hypothesis emphasizes 
the skeletal responses elicited by the 
punishment; the escape hypothesis em- 
phasizes the responses which occur 
shortly before termination of the pun- 
ishment. Any of these responses 
(emotional, skeletal, or escape) may 
be postulated to be produced by dis- 
criminative stimuli under the control 
of the experimenter. If appeal is made 
to response-produced stimuli, e.g., the 
exteroceptive and interoceptive cues of 
anticipatory responding, then the effect 
of punishment will be a function of its 
correlation with the response. The 
final suggestion of a theoretical mech- 
anism which does not require a cor- 
relation between response and punish- 
ment is the discrimination hypothesis 
which emphasizes the similarity be- 
tween the conditions of punishment and 
the conditions of training. 

The fear hypothesis. Some psy- 
chologists would emphasize the impor- 
tance of the unconditioned fear re- 
sponse elicited by the punishment that, 
by the principles of classical condi- 
tioning, may occur to the discriminative 
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stimuli or to the response-produced 
stimuli. For example, Estes (1944) 


wrote, 


It is clear, then, that a disturbing or traumatic 
stimulus arouses a changed state of the organ- 
ism of the sort commonly termed “emotional” 
and that any stimulus present simultaneously 
with the disturbing stimulus becomes a condi- 
tioned stimulus capable of itself arousing the 
state on subsequent occasions [p. 36). 


If an aversive stimulus is administered 
to the subject in a particular stimulus 
situation it may depress its rate of re- 
sponse for positive reinforcement in 
the presence of that situation (Estes 
& Skinner, 1941). But there are many 
instances in which fear increases re- 
sponse strength. For example, the 
rate of avoidance responding may be 
increased by the presentation of a 
stimulus associated with an aversive 
stimulus (Sidman, Herrnstein, & Con- 
rad, 1957). 

The competing response hypothesis. 
Some psychologists would emphasize 
the importance of the unconditioned 
skeletal responses elicited by the pun- 
ishment that, by the principles of clas- 
sical conditioning, may occur to the 
discriminative stimuli or to the re- 
sponse-produced stimuli. For exam- 
ple, Guthrie (1935) wrote, “Punish- 
ment achieves its effects . . . by forcing 
the animal or the child to do some- 
thing different [p. 158]” and “To train 
a dog to jump through a hoop, the 
effectiveness of punishment depends 
on where it is applied, front or rear [p. 
160].” Thus if the responses elicited 
by the aversive stimulus are incom- 
patible with the punished act, punish- 
ment will suppress the act; but if the 
responses elicited by the aversive stim- 
ulus are similar to the punished act, 
punishment may facilitate the act. 

The escape hypothesis. Some psy- 
chologists would emphasize the impor- 
tance of the response that resulted in 
escape from punishment that, by the 
principle of generalization, may occur 


to the discriminative stimuli or to the 
response-produced stimuli. As Gwinn 
(1949) has written, “if the response 
to the punishing stimulus is compatible 
with the punished act, punishment will 
facilitate rather than inhibit an act 
motivated by fear [p. 260)" Al- 
though, in the example given, the un- 
conditioned skeletal response elicited 
by the onset of the punishment was 
also the escape response, this need not 
generally be true. If the punishment 
is of fixed duration of several seconds, 
the subject may be adventitiously re- 
inforced for a particular response; in 
other cases a particular escape re- 
sponse may be required to terminate 
the punishment. 

The discrimination hypothesis, Some 
psychologists would emphasize the 
similarity between the conditions of 
punishment and the conditions of train- 
ing. With the discrimination hypothe- 
sis, the aversive stimulus of the pun- 
ishment procedure is considered as a 
response-produced cue with the same 
functions as nonaversive stimuli fol- 
lowing a response. If punishment re- 
instates a condition of training it may 
facilitate the response; if punishment 
results in a change from the conditions 
of training a generalization decrement 
should be observed. Holz and Azrin 
(1962) have written, “Whenever pun- 
ishment is differentially associated with 
reinforcement, a discriminative prop- 
erty will probably influence the effec- 
tiveness of the punishment.” If pun- 
ishment is correlated with positive 
reinforcement, response rate may be 
increased; if it is correlated with non- 
reinforcement, response rate may de- 
crease. 


Theories Necessarily Involving the 
Correlation of Response and Punish- 
ment 


The theories described above all 


assume that the correlation between 
response and punishment may be ir- 
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relevant for the effect of punishment 
on behavior. Whenever the effect of 
a response-contingent noxious stimulus 
and a response-independent noxious 
stimulus are empirically shown to be 
identical, recourse must be made to one 
of these theories. In most cases, how- 
ever, the performance under conditions 
of response-contingent punishment is 
radically different from that under re- 
sponse-independent aversive stimula- 
tion. For these cases, two theoretical 
mechanisms have been proposed, both 
of which account for response suppres- 
sion (but not facilitation) under con- 
ditions of punishment. 

The suppression hypothesis. Some 
psychologists would emphasize the cor- 
relation between the instrumental re- 
sponse and punishment, and postulate 
some form of inhibition for responding 
in the punishment situation. Thorn- 
dike’s (1913) original statement of 
the law of effect involved suppression 
by punishment. He wrote, “When a 
modifiable connection between a situ- 
ation and a response is made . . . and 
accompanied or followed by an annoy- 
ing state of affairs, its strength is de- 
creased [p. 4].” 

The avoidance hypothesis. Some 
psychologists, also emphasizing the 
correlation between the instrumental 
response and punishment, reject the 
notion that punishment decreases re- 
sponse strength. Instead of postu- 
lating some form of inhibition for re- 
sponding in the punishment situation, 
they postulate some form of reinforce- 
ment for not responding in the punish- 
ment situation. Mowrer (1947), for 
example, wrote: 


The performance of any given act normally 
produces kinesthetic (and often visual, 
auditory, and tactual) stimuli which are 
perceptible to the performer of the act. If 
these stimuli are followed a few times by a 
noxious (‘unconditioned’) stimulus, they will 
soon acquire the capacity to produce the 
emotion of fear. When, therefore, on sub- 
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sequent occasions the subject starts to per- 
form the previously punished act, the result- 
ing self-stimulation will arouse fear; and 
the most effective way of eliminating this 
fear is for the subject to stop the activity 
which is producing the fear-producing stimuli 
[p. 136]. 


The application of the avoidance hy- 
pothesis usually involves (a) a classi- 
cal conditioning process involving 
experimenter-controlled stimuli or sub- 
ject-produced stimuli in association 
with punishment and (b) an instru- 
mental learning process involving some 
kind of reinforcement for a nonre- 
sponse. This may be reduction or 
termination of experimenter-controlled 
stimuli, of subject-produced stimuli, of 
fear, or of expectation of punishment. 
The latter three sources of reinforce- 
ment are currently indistinguishable. 
No major theorist has relied ex- 
clusively upon a single explanation of 
the effects of punishment. The treat- 
ment of punishment by Estes (1944) 
is usually associated with the fear hy- 
pothesis, that of Guthrie (1935) with 
the competing response hypothesis, that 
of Mowrer (1960) with the avoidance 
hypothesis, etc., but all of them have 
used more than one mechanism to ac- 
count for the observed phenomena of 
punishment. Even Dinsmoor (1955), 
in an attempt to interpret as many of 
the effects of punishment as possible 
in terms of the avoidance hypothesis, 
found it necessary to rely upon (a) 
the competing response hypothesis to 
account for Estes’ (1944, Experiment 
I) finding of an equal effect of re- 
sponse-contingent and response-inde- 
pendent aversive stimuli, (b) the es- 
cape hypothesis to account for Gwinn’s 
(1949) and Whiteis’ (1955) finding 
that punishment increased the resist- 
ance to extinction of acts motivated by 
fear, and (c) the discrimination hy- 
pothesis and the fear hypothesis to ac- 
count for Muenzinger’s (1934) finding 
of faster learning in a visual discrimi- 
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nation task if subjects were punished 
for correct responses than if they were 
not punished. Although it has not 
been possible to deal with all of the 
phenomena of the punishment pro- 
cedure with the use of only one of the 
theories listed above, it may be pos- 
sible to account for the data with fewer 
than all of them. Hopefully, an ex- 
amination of the data will lead to a 
new synthesis. 


CoNTIGUITY BETWEEN RESPONSE 
AND PUNISHMENT 


Many psychologists believe that, to 
be effective, a punishment must be pre- 
sented almost immediately after the 
act. For example, Watson (1924) 
wrote, “The idea that a child’s future 
bad behavior will be prevented by giv- 
ing him a licking in the evening for 
something he did in the morning is 
ridiculous [p. 183],” but he defended 
the efficacy of mild punishment, “pro- 
vided the child is caught in the act and 
the parent can administer the rap at 
once in a thoroughly objective way.” 
In his influential monograph on pun- 
ishment, Estes (1944) challenged this 
position. He suggested that, in many 
instances, the effect of punishment 
can be explained in terms of the gen- 
eral emotionalizing effect of the aver- 
sive stimulus, rather than in terms of 
the correlation between the aversive 
stimulus and any particular response. 
Our understanding of the manner in 
which punishment affects behavior is 
considerably enhanced by evidence re- 
garding the relevance of the correlation 
between the response and punishment. 


Contingent versus Noncontingent Pro- 
cedures 


One approach to the problem of the 
relevance of the correlation between 
the aversive stimulus and the response 
involves a comparison of the perform- 
ance of experimental subjects punished 


for a particular response and the per- 
formance of control subjects that re- 
ceive the same aversive stimuli un- 
correlated with the response. If the 
performance of the experimental and 
control subjects is similar, then some 
theory not involving the correlation of 
response and punishment will be re- 
quired to account for the effects of 
punishment. 

In one experiment (Estes, 1944, Ex- 
periment B) rats were trained to press 
a lever on a 1-minute variable interval 
schedule of reinforcement. ‘The sub- 
jects with a 10-minute session of pun- 
ishment-extinction, in which each lever- 
press response was followed by a brief 
“severe” shock, showed suppression 
of the response relative to those with a 
10-minute session of regular extinc- 
tion. In another experiment (Estes, 
1944, Experiment I), after training on 
the 1-minute variable interval schedule 
of reinforcement, rats either received 
a 10-minute session of regular extinc- 
tion or a 10-minute session with shocks 
administered at intervals of about 30 
seconds, but not during or immediately 
following a lever-press response. Again, 
the group receiving shock showed sig- 
nificantly more suppression of the 
leverpress response than the group re- 
ceiving the regular-extinction proced- 
ure, but the important observation was 
that the performance of the two groups 
receiving shocks was similar. The 
data are not sufficient to say that the 
performance under response contin- 
gent punishment was exactly the same 
as under response-independent aver- 
sive stimulation, but the degree of sup- 
pression of response rate was certainly 
of the same order of magnitude. These 
results led a number of investigators 
to design experiments to determine 
whether the correlation between re- 
sponse and noxious stimulus is rele- 
vant to the suppression often found in 
punishment situations. 
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Hunt and Brady (1955) performed 
an important experiment that demon- 
strated some of the differences be- 
tween the contingent and noncontin- 
gent procedures. They trained rats to 
press a lever on a l-minute variable in- 
terval schedule of reinforcement dur- 
ing a number of 12-minute sessions, 
and then assigned the subjects to two 
kinds of groups. In the case of the 
subjects in the Punishment group, all 
responses during a 3-minute CS were 
punished with a 1.5 milliampere shock ; 
in the case of the subjects in the CER 
group, no responses during the CS 
were punished but there were two 
momentary 1.5 milliampere shocks at 
the time of CS termination. These 
conditioning sessions were interspersed 
with adaptation sessions in which no 
CS or shock was used, and they were 
followed by 10 days of regular ex- 
tinction. The results showed almost 
complete suppression in response rate 
during the CS for the Punishment 
group, in which the shocks were con- 
tingent on the response, and for the 
CER group, in which the shocks were 
contingent upon the stimulus. There 
were, however, a number of reliable 
differences between the treatments: 
(a) the amount of suppression dur- 
ing the adaptation days (in the absence 
of the CS) was greater for the CER 
group than for the Punishment group, 
(b) the resistance to extinction was 
considerably greater for the CER group 
than for the Punishment group, and 
(c) the behavior of rats in the two 
groups was radically different. In the 
CER group the dominant response pat- 
tern was crouching, freezing, and defe- 
cating; in the Punishment group the 
dominant response pattern was abor- 
tive leverpressing. These results are 
similar to those of the earlier study 
by these investigators (Hunt & Brady, 
1951). 

Azrin (1956) also demonstrated dif- 


erences between a situation in which 
the aversive stimulus was contingent 
upon a response and a situation in 
which the aversive stimulus was not 
contingent upon the response. He 
trained pigeons on a 3-minute variable 
interval schedule of reinforcement and 
then alternated an orange and a blue 
light on the response key every 2 min- 
utes. In the presence of a blue light on 
the key there was a continuation of the 
reinforcement procedure; in the pres- 
ence of an orange light on the key 
there was the addition of a punishment 
procedure. The punishment was 600 
volt ac for .5 second through 120,000 
ohms in series with a grid. The nox- 
ious stimulus was either contingent 
upon a response (scheduled to follow 
the first response after a fixed or 
variable length of time after the onset 
of the orange light) or it was not con- 
tingent upon the response (scheduled 
to occur a fixed or variable length of 
time after the onset of the orange 
light). The results indicated that, 
both in the case of the fixed and vari- 
able interval, the response rate was 
dramatically lower in the case of the 
contingent than in the noncontingent 
situation. 

Both the studies of Azrin (1956) 
and Hunt and Brady (1955) demon- 
strate that the contingency between re- 
sponse and punishment is a relevant 
dimension of the punishment situation. 
The results of these studies suggest 
that the contingent punishment pro- 
cedure, relative to the noncontingent 
procedure, produces (a) greater sup- 
pression of the punished response, (b) 
less suppression of other responses, 
and (c) less resistance to extinction. 
Further comparisons of the contingent 
and noncontingent procedures should 
be made. Based on the data from 
Hunt and Brady (1955) and Azrin 
(1956), the amount of suppression 
should be greater for the contingent 
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group, especially at low levels of pun- 
ishment intensity. If the level of shock 
intensity were sufficiently high, both 
the contingent and noncontingent 
groups may hardly respond at all, but 
at lower levels of shock intensity, the 
groups may be more easily differenti- 
ated. In a parametric investigation of 
shock intensity in CER, Annau and 
Kamin (1961) noted that a level of 
shock insufficient to produce CER (.28 
milliampere) was sufficient to produce 
a punishment effect. Presumably at 
such low levels of punishment in- 
tensity, the most striking differences 
between the contingent and noncon- 
tingent procedures are to be found. 
None of the previous studies com- 
paring the effect of contingent and non- 
contingent shocks on response rate 
have equalized the number and tem- 
poral distribution of shocks received 
by the two groups, although the in- 
vestigators have believed that it is 
improbable that the differences ob- 
served were a result of this confound- 
ing variable. Unfortunately, the use 
of a matched (yoked) control pro- 
cedure to equalize the number and 
temporal distribution of the shocks re- 
ceived by subjects under the contin- 
gent and noncontingent procedures 
could lead to serious errors. Con- 
sider an experiment in which subjects 
that have been trained to press a lever 
are paired on some basis and one of 
the two members of each pair is ran- 
domly selected as the experimental 
subject with the other member of the 
pair as its matched control subject. 
An aversive stimulus then can be de- 
livered to both experimental and con- 
trol subjects immediately following each 
response by the experimental subject. 
As a result a control subject receives 
the same aversive stimuli as its 
matched experimental subject but the 
aversive stimuli are presented to it in 
a way not necessarily correlated with 
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the response. The problem is that a 
reliable difference between the experi- 
mental and control groups could 
emerge even if the temporal relation- 
ship between response and punishment 
were irrelevant, assuming only that 
there are individual differences in the 
effectiveness of the shock in suppres- 
sing behavior. Typically, reliable in- 
dividual differences may be demon- 
strated between a pair of subjects, so 
that either the experimental subject or 
its matched control subject will be 
more affected by the aversive stimu- 
lus. At levels of shock intensity that 
result in total suppression after a few 
applications, if the control subject is 
more affected than the experimental 
subject, the control subject will stop 
responding first and the experimental 
subject will continue to respond a few 
more times until it has produced 
enough additional aversive stimuli to 
suppress its own response. If the ex- 
perimental subject is more affected 
than the control subject, however, the 
experimental subject will stop respond- 
ing first, and the control subject will 
continue to respond indefinitely since 
the experimental subject is delivering 
no additional shocks to it. A sta- 
tistical test that did not take into con- 
sideration the magnitude of the dif- 
ference, e.g., the sign test, would meet 
this objection but it would require the 
unreasonable assumption that the ef- 
fectiveness of the punishment is con- 
stant in time for a given subject. 
Lichtenstein (1950) has demon- 
strated that long-lasting feeding inhi- 
bitions may be developed in dogs by 
punishing the act of eating. Ten dogs 
were first trained to eat pellets in a 
stock, and then they were punished for 
eating. Under this treatment food was 
presented to the dog and, if it began to 
eat, an 85-volt ac electric shock of 
2-second duration was administered to 
its foreleg. These subjects inhibited 
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the act of eating after a mean of 1.7 
shocks, and they did not eat again in 
the stock on three subsequent days of 
20 trials per day. Three other dogs 
that received the aversive stimulation 
when the food was presented (i.e., be- 
fore they began to eat) did not form a 
feeding inhibition. Apparently, the 
feeding inhibition was considerably 
more pronounced if the shock was ad- 
ministered simultaneously with the 
response than if it was administered 
immediately after the presentation of 
the food. In the latter case, the ani- 
mal may be afraid of the food, but it is 
not afraid to eat it. Masserman (1943) 
has described similar feeding inhibi- 
tions in cats that received a brief air 
blast or electric shock at the moment 
of eating. Most of the cats refused to 
eat in the apparatus for months with- 
out further punishment, despite the 
fact that they were severely deprived 
of food. Using the Lashley jumping 
apparatus, Klee (1944) has found 
that a rat may starve to death rather 
than respond in an insoluable problem 
which results in food reward on half 
the trials and punishment on the other 
half. 


Delay of Punishment Gradient 


In a direct comparison between the 
contingent and noncontingent proce- 
dures, the aversive stimulus either oc- 
curs immediately after the response 
or it is unrelated to the response. If 
the contiguity between response and 
punishment is a parameter of conse- 
quence, then there should be a delay of 
punishment gradient. Such a gradi- 
ent has been found in a Y maze by 
Warden and Diamond (1931), but 
not by Bevan and Dukes (1955). It 
has also been found in a lever box 
by Sidman (1953), and in a shuttlebox 
by Kamin (1959) and by Coons and 
Miller (1960). 


In Kamin’s experiment, rats were 
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given standard avoidance training in 
which the subject could avoid a shock 
of 1.1 milliamperes by moving to the 
other half of a shuttlebox within a 10- 
second CS-US interval, or it could 
escape from the shock by moving to the 
other half of the shuttlebox after the 
shock had gone on. After the subject 
met the acquisition criterion of 11 con- 
secutive avoidance responses, a punish- 
ment-extinction procedure was begun 
in which shock was administered only 
if the subject moved to the other side of 
the shuttlebox in the 10-second CS-US 
interval. During the punishment-ex- 
tinction period, there was a delay of 
punishment of 0, 10, 20, 30, or 40 sec- 
onds, and a control group that received 
no punishment for responses during 
extinction. The results showed a delay 
of punishment gradient, with the num- 
ber of responses to extinction positively 
related to the temporal interval between 
response and punishment. Kamin 
notes, however, that the number of 
responses to extinction was consider- 
ably greater in the unpunished control 
group than in the group with 40-second 
delay and considers this supportive of 
the generalized emotional effect of shock 
that was only remotely contingent. Al- 
though it must be recognized that 
shock, per se, whether or not it is as- 
sociated with a particular response will 
have clearly measurable effects, Ka- 
min’s results (1959) demonstrate that 
the temporal relationship of the punish- 
ment to the response is a relevant 
parameter. 


The CS-US Interval in Punishment 
Training 


Normally, in the punishment. sittt- 
ation the aversive stimulus is applied 
immediately after the response. There 
are some studies of punishment, how- 
ever, that more closely resemble the 
avoidance procedure by using a fixed 
CS-US interval. In these studies pun- 
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ishment is applied a given number of 
seconds after the stimulus, if a par- 
ticular response occurs (Bixenstein, 
1956; Mowrer & Ullman, 1945). In 
the Mowrer and Ullman study, for ex- 
ample, rats were trained to go to a 
food cup and eat during a 3-second 
buzzer. During punishment-training, 
a 2-second shock was administered to 
a subject if it made the response of 
eating the food during the 3-second 
buzzer. (They were free to eat the 
food after this time.) The punish- 
ment for the response occurred 3 sec- 
onds, 6 seconds, or 12 seconds after 
the onset of the buzzer for the three 
groups. The results of this experi- 
ment showed that the percentage of 
trials in which the subjects waited 
throughout the 3-second period was 
inversely related to the CS-US in- 
terval. This experiment, presumably, 
can be interpreted in a manner similar 
to delay of punishment studies, ¢g., 
the greater the interval between re- 
sponse and punishment, the less ef- 
fective the punishment for the suppres- 
sion of the response. 


Selective Punishment of a Quantitative 
Dimension of a Response 


In all of the studies considered so 
far, there has been a single measured 
response that would, under certain 
conditions, be followed by punishment. 
Now, we must consider cases of pun- 
ishment of selective learning in which 
there are two or more measured re- 
sponses. Logan (1960) has reported 
a number of experiments in which 
punishment was correlated with the 
speed characteristic of a response. 
Rats were trained to run down a 4-foot 
alley and then the final 1 foot was 
electrified for 150 milliseconds when 
the rat crossed it. Although there was 
no increase in speed when punishment 
was differentially applied to slow re- 
sponding, there was a decrease in speed 
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when punishment was differentially 
applied to fast responding. 


Selective Punishment of a Qualitative 
Characteristic of a Response 


In a two-choice situation, if one re- 
sponse results in food reward for a 
hungry rat (the “right” response) and 
the other response does not (the 
“wrong” response), the evidence is 
overwhelming that punishment for the 
wrong response results in facilitation 
of the response differentiation. The 
early work with the discrimination 
box by Hoge and Stocking (1912) 
and Warden and Aylesworth (1927) 
clearly indicated that the rate of learn- 
ing is greater if nonreward and a 
brief punishment follow the wrong re- 
sponse than if merely nonreward fol- 
lows the wrong response. Muenzin- 
ger’s (1934) experiment may be taken 
as a model for this kind of finding. He 
trained rats in a T-shaped discrimi- 
nation box to run to a black or white 
card that was first visible from the 
choice point. Fifteen subjects were 
trained under a correction procedure 
with 75-millisecond pulses of a con- 
stant current of 0.15-milliampere de 
for the wrong choice; 15 other subjects 
were trained under the same condi- 
tions, but no punishment. In 100 
trials of training, the group that was 
punished for wrong responses had a 
mean of 10.8 errors; the group that 
was not punished had a mean of 30.0 
errors (p < 01). Further evidence 
that punishment of incorrect responses 
under a correction procedure increases 
the speed of learning of a visual dis- 
crimination habit is given by Muenzin- 
ger, Bernstone, and Richards (1938) 
and by Muenzinger and Powloski 
(1951). Punishment for incorrect re- 
sponses in a visual discrimination habit 
under the noncorrection procedure also 
produces faster learning than no pun- 
ishment (Muenzinger, Brown, Crow, 
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& Powloski, 1952; Muenzinger & Pow- 
loski, 1951; Wischner, 1947; Wisch- 
ner, Fowler, & Kushnick, 1963). Al- 
though Fairlie (1937) found that 
shock at the “moment of choice” for 
wrong responses did not facilitate 
learning, and Drew (1938) found that 
subjects with no punishment made 
fewer errors than subjects receiving 
punishment for wrong responses, there 
are few exceptions to the general state- 
ment that punishment for the incor- 
rect response results in faster learn- 
ing. There is also some evidence that 
punishment may lead to faster reversal 
learning in a two-choice situation 
(Whiting & Mowrer, 1943). 

In conclusion, it appears that both 
psychologists who emphasize the cor- 
relation between response and pun- 
ishment and those who emphasize the 
general emotionalizing effect of pun- 
ishment are correct. If an aversive 
stimulus is contingent upon the oc- 
currence of a response it will be more 
effective than if it is not contingent 
upon the response. Nonetheless, the 
mere presentation of stimuli associ- 
ated with an aversive stimulus may 
serve to suppress responding. No ade- 
quate theory of punishment can fail 
to take account of both observations, 
although it would seem to be of par- 
ticular importance to understand why 
the contingent procedure is more ef- 
fective than the noncontingent pro- 
cedure. 


PUNISHMENT OF POSITIVE INSTRU- 
MENTAL RESPONSES 


Although punishment contiguous 
with a response is more effective than 
equivalent aversive stimulation admin- 
istered independently of the response, 
punishment may not be an effective 
technique for reducing the strength of 
a response. Most psychotherapists do 
not use punishment techniques to eli- 
minate the undesirable behavior of 
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their patients and, in fact, they typi- 
cally act in a notably permissive man- 
ner. Boardman (1962), however, de- 
scribes the case of a 5-year-old boy 
with a severe behavior disorder who 
was effectively treated by a short 
period of punishment. The boy's 
symptoms included running away 
from school, lying, stealing, walking 
on the roof of his house, riding his bi- 
cycle on a busy street, setting fires, 
and destroying property. On instruc- 
tions from the therapist, the boy’s par- 
ents severely punished him for such 
misbehavior. When such behavior ap- 
peared they would immediately punish 
him by spanking him, refusing him 
meals, calling off his birthday party 
and presents, locking him in his room, 
and even locking him out of the house. 
Within a week of this treatment the 
major symptoms of this patient were 
eliminated, and they did not recur dur- 
ing the next 11 months. In comments 
on this paper, Bandura (1962) ex- 
pressed concern that the punishment 
technique would have undesirable side- 
effects, e.g., that the parents would 
serve as models for aggression, that the 
child would avoid his parents, that 
some of the child’s methods of avoid- 
ing the punishment might be unde- 
sirable, or that the punishment might 
increase his aggressive responses. 
Miller (1962) described various bases 
for the apparent success of the treat- 
ment, e.g., punishment helped the boy 
atone for his guilt, and the treatment 
resulted in his getting more attention 
from his parents. He predicted that 
the punishment treatment would have 
only temporary effect. 

Because of the difficulty of evalu- 
ating the success of the punishment 
technique in the clinical situation, we 
will examine the effect of experimental 
studies of punishment, most of which 
used rats as the subjects and electric 
shock as punishment. There are 
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many studies in which the subject is 
trained to perform some positive in- 
strumental response, to run to the end 
of an alley or to press a lever for food 
reward, and then after training has 
progressed to some point, responses 
are punished as well as reinforced. 
This procedure establishes an ap- 
proach-avoidance conflict, extensively 
analyzed by Miller (1959). There is 
some evidence that an animal will 
learn to make a response to terminate 
such a conflict situation (Hearst & 
Sidman, 1961). The most reliable ef- 
fect in experiments comparing the pun- 
ishment-training procedure with the 
regular-training procedure is a sup- 
pression of response under conditions 
of punishment. 

A number of different aversive stim- 
uli have been used in studies of pun- 
ishment to suppress behavior, such as 
a slap from a lever (Skinner, 1938), 
a bump on the nose and a fall into a 
net (Maier, 1949), a loud noise (Carl- 
smith, 1961), a toy snake (Masserman 
& Pechtel, 1953), and a swat with a 
rolled-up piece of newspaper (Stanley 
& Elliot, 1962). Nonetheless, electric 
shock has been employed as the aver- 
sive stimulus in most studies of pun- 
ishment, and the characteristics of 
this punishing stimulus are particu- 
larly easy to measure and control. 
Electrodes firmly attached to the sub- 
ject, either on the surface of the skin 
or in some internal tissue allow even 
more exact control of the parameters 
of the punishing stimulus than the typ- 
ical grid electrodes. 

Of course, the intensity of the nox- 
ious stimulus employed in the punish- 
ment experiment is a critical factor in 
its effect on behavior. As the intensity 
of the punishment is increased, the 
following phenomena often emerge: 
(a) detection: the punishment has no 
influence on the response, although it 
is sufficiently intense to be used as a 
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cue; (b) temporary suppression: the 
punishment results in a temporary sup- 
pression of the response, followed by 
complete recovery; (c) partial sup- 
pression: the punishment results in a 
suppression of the response, without 
complete recovery; (d) total suppres- 
sion: the punishment results in com- 
plete suppression of the response, with- 
out recovery. Other dimensions of the 
noxious stimulus have been less 
thoroughly studied than intensity, but 
they may be equally as important. 
Campbell and Teghtsoonian (1958) 
have described some of the conse- 
quences of variations of frequency and 
source impedance when external elec- 
trodes are used, Finally, duration 
may be particularly critical. If the 
punishment is of brief duration, the 
coulomb may be a more accurate re- 
flection than the ampere of its efficacy 
as a suppressor of behavior. If the 
punishment is of longer duration, re- 
sponses that happen to occur at the 
time of shock termination may become 
adventitiously reinforced. 


Response Suppression as a Function 
of Punishment Intensity 


A recent study by Karsh (1962) 
may be taken as representative of 
studies indicating the degree of sup- 
pression is a monotonically increasing 
function of the level of intensity of the 
punishment. In Experiment I rats 
were trained to run to the goal sec- 
tion of an 8-foot alley for food rein- 
forcement during the first 75 trials. 
Then each subject received one trial 
per day for 40 days with both food 
and shock at the goal. The levels of 
shock intensity for the various groups 
were 0, 75, 150, 300, and 600 volts 
ac for 100 milliseconds administered 
through 250,000 ohms resistance in 
series with the rat. The subjects re- 
ceiving 75-volt punishment were simi- 
lar to control subjects, and they showed 
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no clear change in speed of running 
down the alley. The subjects with 300- 
and 600-volt punishments showed com- 
plete cessation of running within a few 
trials. Perhaps the most interesting 
group in this experiment was the 150- 
volt punishment group which, unlike 
the lower shock groups, slowed down, 
but unlike the groups with the higher 
shocks, did not cease to respond. 

Several experiments by Azrin (1959, 
1960, 1961), Holz, Azrin, and Ulrich 
(1963), and Azrin, Holz, and Hake 
(1963) using pigeons in a Skinner 
box have also demonstrated that the 
intensity of the punishment is an im- 
portant variable determining the 
amount of suppression of a response. 
The procedure used was to train pi- 
geons in a positive instrumental re- 
sponse of pecking a key under some 
schedule of reinforcement, and then to 
punish every response at some level 
of shock intensity. The shock was 
a variable ac voltage administered for 
durations usually less than 100 milli- 
seconds through a fixed resistor into 
electrodes implanted in the subject’s 
back. A large number of sessions were 
run under various levels of shock in- 
tensity, including 0, either in ascend- 
ing order of intensities or in a mixed 
order. When a subject received ex- 
tremely mild punishment, e.g., 10 volts, 
there was no apparent change in its 
behavior relative to its performance 
with no punishment; when a subject 
received extremely intense punishment, 
e.g., 130 volts for durations somewhat 
longer than 100 milliseconds, com- 
plete suppression of response was ob- 
tained. At intermediate levels of 
shock intensity, e.g., 80 volts, the sub- 
ject reduced its rate of response, but 
it did not cease to respond. 

Despite radical differences between 
their procedures, both Karsh and Azrin 
found intensities of punishment that 
were ineffective, partially suppressive, 
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and totally suppressive. Other ex- 
perimenters have also obtained greater 
suppression as a function of increased 
intensity of the punishment (Dins- 
moor, 1952; Estes, 1944). 

Fowler (1959), in an extensive 
parametric investigation, found that a 
mild punishment of short duration ad- 
ministered to a rat at the moment it 
touches the food can increase individ- 
ual differences in running speed in an 
alley without affecting the average 
speed of a group. He presented evi- 
dence that whether a particular sub- 
ject increased or decreased its speed 
as a function of punishment depended 
upon whether the skeletal response 
elicited by the aversive stimulus was 
compatible or incompatible with the 
instrumental response. Despite the 
difficulty in distinguishing between 
compatible and incompatible elicited re- 
sponses, Fowler's observations provide 
some of the best evidence in support 
of the competing response hypothesis, 
at least for situations involving pun- 
ishments of brief duration. 


Response Suppression as a Function 
of Proximity to the Punishment 


A response suppression gradient has 
been frequently reported in both the 
case of punishment of positively rein- 
forced responses in the alley and in 
the Skinner box. For example, the 
speed of running is slower as the sub- 
ject approaches the goal box (Karsh, 
1962) and its strength of pull away 
from a punished goal is greater the 
nearer the subject is to the goal 
(Brown, 1948). In the case of re- 
sponding on a fixed interval schedule 
of punishment, the rate of responding 
decreased to almost 0 close to the 
temporal point when punishment was 
to be received and, as a result, the 
subjects received few punishments 
(Azrin, 1956). The degree of re- 
sponse suppression is a direct function 


E——————E— = 


EFFECTS or PUNISHMENT ON Brnavior 383 


of the proportion of the responses that 
result in punishment, so that intensi- 
ties of punishment that are effective if 
they typically follow a response may 
be ineffective if they only occasionally 
follow a response (Azrin, Holz, & 
Hake, 1963). 


Response Suppression as a Function 
of the Strength of the Punished Re- 
sponse 


In general, it may be supposed that 
the amount of suppression of a re- 
sponse is inversely related to its 
strength. Postman (1947) described 
the evidence available at the time that 
responses of weak strength are more 
liable to disruption by punishment. 
Estes (1944, Experiment F) is con- 
sistent with this interpretation. Miller 
(1959) proposed that the factors that 
increased the excitatory strength of 
the positive response would serve to 
decrease the effectiveness of punish- 
ment. Thus the amount of suppres- 
sion would be decreased by increase 
in drive level, decrease in delay of 
reward, increase in amount of reward, 
and increase in number of trials of 
training. The data of Bower and 
Miller (1960) support the notion that 
increasing the amount of reward in- 
creases the subjects’ resistance to pun- 
ishment, but the data on the effect of 
number of trials of training were not 
at all in the expected direction. Miller 
(1960) and Karsh (1962) found that 
overtraining increased, rather than de- 
creased, the effect of punishment. 


Response Suppression as a Function 
of Prior Exposure to Punishment 


The effect of the intensity of the 
punishment, specified in physical terms, 
may be influenced by the amount and 
type of prior exposure to punishment 
that the subject has previously experi- 
enced. Therefore, the independent 
groups design utilized by Karsh (1962) 


in the study of the effect of intensity 
of punishment is useful in eliminating 
the variable to prior exposure. In the 
absence of data, one might speculate 
either that prior exposure to shock 
would serve to increase the resistance 
of a rat to later disruption by shock 
(adaptation) or one might expect that 
such exposure would make the rat emo- 
tional andvserve to decrease its re- 
sistance to later disruption by shock 
(sensitization). The data consistently 
support the adaptation hypothesis. 
Miller (1960) investigated this 
problem by training rats to run down 
an alley for food reinforcement for 
150 trials and then subjecting them to 
a punishing shock of 335 volts through 
250,000 ohms for .1 second when the 
subject picked up the food reinforce- 
ment. These subjects were compared 
with a group that had a gradually in- 
creasing level of punishing shock dur- 
ing the last 75 trials of the 150 trials 
of training. The group with the grad- 
ually increasing shock was much less 
affected by the 335-volt punishment 
than the group that did not have prior 
exposure to the shock. This result 
suggests that experiments using an 
ascending order to punishment err in 
the conservative direction with respect 
to obtaining treatment differences, al- 
though such differences can be ob- 
tained using the ascending order (Lo- 
gan, 1960). With respect to the ef- 
fectiveness of the gradually increasing 
exposure to shock on later reduction 
in the effect of punishment, it is im- 
portant to note that there was no 
adaptation if the shock was adminis- 
tered in an apparatus different from 
the one used in training. A similar 
result has been reported in the case 
of adaptation of the CER (Kamin, 
1961). Baron and Antonitis (1961), 
however, found that 18 trials of shock 
in one apparatus reduced the suppres- 
sive effect of punishing shock in an- 
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other apparatus on the untrained lever- 
press response of mice. 

A second line of evidence in favor 
of adaptation to punishment is pro- 
vided by Azrin@(1959, 1960, 1961) 
who found that continued presenta- 
tion of brief punishments of moderate 
intensity results in an immediate par- 
tial suppression of response, followed 
by complete recovery afta number 
of sessions. 


Effect of Punishment-Extinction on 
Resistance to Extinction 


Most of the studies of punishment of 
positive instrumental responses have 
involved a comparison of the punish- 
ment-training and the regular-training 
procedures. Only a few studies have 
compared punishment-extinction with 
regular-extinction, but they demon- 
strate that subjects respond more 
slowly during punishment extinction 
(Matsumiya, 1960; Mowrer & Aiken, 
1954; Mowrer & Solomon, 1954). All 
of these studies have employed a stim- 
ulus associated with a primary noxious 
stimulus as the punishment. The pro- 
cedure was (a) to allow a rat to press 
a lever for food reinforcement, (b) to 
present a neutral stimulus in temporal 
contiguity with an unconditioned shock 
stimulus, and (c) to count the num- 
ber of responses per minute of the sub- 
jects under conditions when each re- 
sponse produced a brief presentation of 
the formerly neutral stimulus. These 
studies have demonstrated that the 
amount of response suppression is a 
function of intensity of the punishment 
(Matsumiya, 1960), CS-US pattern 
(Matsumiya, 1960; Mowrer & Aiken, 
1954) but not of the conditions of US 
termination (Mowrer & Solomon, 
1954). It has been observed that if a 
parent threatens to punish and does 
punish his child long after the re- 
sponse has occurred, the act may not 
be suppressed but the parent’s threats 
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may become an object of fear. Fol- 
lowing the paradigm of the experi- 
ments above, the threats of punish- 
ment by such a parent, administered in 
association with a response, should 
serve to suppress the response. 
Although the subjects reach a cri- 
terion of extinction more rapidly 
under conditions of punishment-ex- 
tinction than under conditions of 
regular-extinction, several studies sug- 
gested that punishment did not affect 
resistance to extinction. Skinner 
(1938) described a situation in which 
a short period of mild punishment did 
not serve to reduce the number of re- 
sponses during regular extinction. 
Estes (1944, Experiment A) found 
that a 10-minute period of mild pun- 
ishment did not influence the number 
of responses to extinction or the time 
to extinction. However, a 10-minute 
period of severe punishment did re- 
duce the number of responses to a cri- 
terion of extinction, although it did 
not affect the time to reach that crite- 
rion, and a l-hour period of severe 
punishment reduced both the time to 
reach a criterion of extinction and the 
number of responses to reach that cri- 
terion (Estes, 1944, Experiments B 
and C). Dinsmoor (1952) also found 
conditions under which punishment- 
extinction was more effective than an 
equal period of regular-extinction in 
reducing resistance to extinction. 


Response Facilitation under Condi- 
tions of Punishment 


Although the dominant effect of pun- 
ishment of a response is the suppres- 
sion of that response, there has been 
a continuing search for paradoxical 
effects of punishment. There are a 
number of situations in which punish- 
ment of a positive instrumental re- 
sponse results in facilitation of the 
response. 
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Discriminative properties of pun- 
ishment, The clearest cases of re- 
sponse facilitation under conditions of 
punishment have been provided by the 
experiments of Holz and Azrin (1961, 
1962). In these studies, if punishment 
is associated with nonreinforcement it 
results in a decrease in response rate, 
but if punishment has been correlated 
with positive reinforcement it may re- 
sult in an increase in response rate. 
In their first study of this phenomenon, 
Holz and Azrin (1961) demonstrated 
that pigeons responded far more rapidly 
under conditions of punishment-extinc- 
tion than under conditions of regular- 
extinction if they had previously been 
given both sessions of regular-extine- 
tion and punishment-training. This 
result was obtained both with a punish- 
ment that reduced the rate of response 
to one-half its previous level and with 
an intensity that did not influence the 
response rate. 

In their recent study of the discrimi- 
native function of punishment, Holz 
and Azrin (1962) trained pigeons on 
a fixed interval schedule of reinforce- 
ment and punished all responses dur- 
ing various portions of the interval 
with shocks of various intensities. The 
effect of the punishment at fairly low 
intensities was similar to that of a 
response-produced neutral cue, a green 

- light. At higher intensities there was 
increased suppression. Apparently, 
the punishment may serve as a re- 
sponse-produced cue correlated with 
the reinforcement schedule, and an 
event leading to suppression of re- 
sponse rate. 

Punishment of an incompletely 
learned response. If punishment-train- 
ing is introduced before the asymptotic 
performance under regular training 
procedure has been reached, there may 
be further improvement under the 
punishment training procedure. Such 
an observation was made by Karsh 
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(1962), and in such cases there is no 
reason to believe that the introduction 
of the punishment served to increase 
response speed more than would have 
been obtained with i trials un- 
der regular training itions. 

Contrast effects. One of the more 
interesting facilitation effects in the 
case of punishment of a positive in- 
strumental gësponse is that sometimes 
observed when the punishment is 
omitted, Azrin (1960, 1962) found 
that if a mild punishment is applied to 
every response there is a temporary 
suppression and then a complete re- 
covery. When punishment is termi- 
nated, however, there may be a tem- 
porary increase in response rate over 
that which would have occurred with- 
out punishment. Similarly, the rate 
of response in the presence of a stimu- 
lus correlated with positive reinforce- 
ment may be increased by the punish- 
ment of responses in the presence of 
another stimulus (Brethower & Reyn- 
olds, 1962). 

These observations may be taken as 
evidence for “contrast” but its opposite, 
“generalization,” can often be found. 
Generalization may be said to occur 
when punishment of one response af- 
fects related responses in a similar 
manner, but perhaps to a lesser extent. 
Thus termination of punishment may 
be followed by a residual suppression 
of response (Hunt & Brady, 1955) 
and punishment of responses in the 
presence of one stimulus may decrease 
the response rate in the presence of 
other stimuli (Dinsmoor, 1952). Fur- 
ther work is necessary to determine 
the conditions under which pun- 
ishment of one response weakens, 
strengthens, or leaves unchanged re- 
lated responses. 


Punishment of the “Right” Response 


Muenzinger (1934) described one 
of the most baffling of the paradoxi- 
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cal effects of punishment, when, in his 
highly original study, he found that 
punishment of the “right” response in 
a selective learning situation (i.e., the 
response that leads to food) served to 
increase the rate of development of 
response differentiation. Muenzinger 
noted that the earlier work of Hoge 
and Stocking (1912) and Warden and 
Aylesworth (1927) demonstrated that 
shock after the wrong response in- 
creased the rate of selective learning. 
Was the increase due to the correla- 
tion of the punishment with the wrong 
response or was it due to some other 
characteristic of the shock? Muen- 
zinger’s first attempt to answer this 
question was to determine the rate of 
learning of groups of rats punished for 
correct responses, and to compare their 
performance with that of rats that 
were not punished or punished for in- 
correct responses. The results of the 
first experiment were dramatic. The 
subjects shocked for correct responses 
were similar to those that were shocked 
for incorrect responses, both of which 
learned the discrimination habit more 
quickly than the subjects without pun- 
ishment. Apparently punishment did 
not act on a specific response but had 
some more general function. 

One possible explanation for these 
findings, and the one emphasized by 
Muenzinger, was that the punishing 
shock after the choice point served 
to slow down the subject at the choice 
point so that it was more fully ex- 
posed to the relevant cues. To explore 
this possibility, Muenzinger and Wood 
(1935) compared the performance of 
subjects that were shocked after each 
choice with those shocked before each 
choice. The former group learned 
more quickly than the latter. The sub- 
jects that were punished on each re- 
sponse, whether right or wrong, 
learned about as quickly as those that 
were punished on all correct responses 
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or on all incorrect responses. The 
subjects that were punished before 
each choice learned no more quickly 
than the subjects that were not pun- 
ished. The evidence was mounting 
that punishment facilitated selective 
learning because it slowed down the 
subject at the choice point. Other 
methods of slowing down the subject 
at the choice point also served to in- 
crease the speed of discrimination 
learning, such as a gap after the choice 
point, but not a gap before the choice 
point (Muenzinger & Newcomb, 1936) 
and an enforced delay at the choice 
point (Muenzinger & Fletcher, 1937). 

The conclusions from Muenzinger’s 
1934 experiment had to be modified 
in a critical way after a replication of 
the research demonstrated that subjects 
shocked for all correct responses made 
more errors than subjects shocked for 
all wrong responses (Muenzinger, 
Bernstone, & Richards, 1938). Thus, 
as we have previously observed, pun- 
ishment of a qualitative characteristic 
of a response selectively suppresses 
that characteristic. Nevertheless, the 
investigators again found that the 
average number of errors in 100 trials 
of training was significantly greater 
for subjects that were not punished 
than for subjects that were shocked 
on all correct responses, although the 
magnitude of the difference was not 
as great as in the original experiment. 

Wischner (1947) performed an im- 
portant experiment demonstrating that 
the statement that punishment for cor- 
rect responses increases the speed of 
learning is too broad a generalization. 
Using a noncorrection method in a 
discrimination box, Wischner (1947) 
found that the group that was pun- 
ished for wrong responses learned more 
quickly than those not punished, but 
that the group that was punished for 
right responses was similar with re- 
spect to the total number of errors to 
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the group that was not punished. 
Wischner suggested that the superior- 
ity of the group punished for correct 
responses over the group not punished 
may be a finding restricted to the cor- 
rection method. In a comparison of 
the results of this experiment with his 
own, Muenzinger (1948) emphasized 
the relevance of the definition of learn- 
ing efficiency. In the correction 
method it is traditional to define a trial 
as a sequence of one or more re- 
sponses ending in a reinforcement, 
and to define an error as a sequence 
in which the subject enters the incor- 
rect alley one or more times. In the 
noncorrection method, on the other 
hand, it is traditional to define a trial 
as a single entry into one of the two 
alleys, and an error as an entry into 
the incorrect alley. In Wischner’s 
(1947) experiment subjects in the 
shock-right group began with signifi- 
cantly greater than chance number of 
errors but when they did begin to 
learn to enter the alley with the shock 
they learned quickly. Thus different 
measures of learning efficiency may 
give different conclusions. What is 
the proper measure of efficient learn- 
ing? If time is short, the number of 
trials to criterion is critical; if mis- 
takes are costly, the number of errors 
to criterion is critical, but if the cost 
of the reward is greater, then the num- 
ber of reinforcements to criterion is 
critical. 

Wischner’s (1948) reply emphasized 


the differences between the methods , 


employed rather than the definition 
of the efficiency of learning. Although 
Wischner emphasized the significance 
of the use of the correction technique 
to obtain the facilitation effect, Muen- 
zinger found that the superiority of a 
shock-right group over a nonshocked 
group may be obtained under condi- 
tions of the noncorrection procedure. 
Muenzinger and Powloski (1951) 
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found that the shock-right group 
learned more quickly than a nonshock 
group under a noncorrection procedure, 
although the differences between the 
treatments were more pronounced with 
a correction procedure. Muenzinger, 
Brown, Crow, and Powloski (1952) 
found that the shock-right group pro- 
duced faster learning than a nonshock 
group after pretraining trials with 
shock. With shock adaptation the 
shock-right group was similar to the 
shock-wrong group; without shock 
adaptation the shock-right group was 
similar to the nonshocked groups. 
Prince (1956, Experiment II) found 
that after 25 trials of regular training 
subjects under conditions of punish- 
ment for correct responses showed 
faster learning than nonshocked con- 
trols. (The differences between pun- 
ished and unpunished groups was less 
apparent with 0 or 15 trials of regular 
training. ) 

Wischner, Fowler, and Kushnick 
(1963) observed that the Muenzinger 
noncorrection experiments have some 
similarities to the typical correction 
experiments, €g» trials are massed 
and the location of the stimuli is not 
changed after an error. They found 
more rapid learning of a visual discrim- 
ination habit with nonshock than with 
shock for the correct response at all 
shock intensities used. As punishment 
intensity increased, the magnitude of 
the differences increased. Thus, at 
the present time there is a clear con- 
flict in the data regarding the relative 
efficiency of the shock-right and non- 
shock procedures. 

The effect of punishment for every 
response in selective learning is not yet 
clear, As noted above, Muenzinger 
and Wood (1935) found in a correc- 
tion procedure that punishment of 
every response resulted in faster learn- 
ing than no punishment. Freeburne 
and Taylor (1952), using a noncorrec- 
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tion procedure, also found that sub- 
jects shocked for both right and wrong 
responses took fewer trials to criterion 
than subjects that received no shock. 
Prince (1956, Experiment I), how- 
ever, using a noncorrection procedure, 
found subjects shocked for both right 
and wrong responses took an equiva- 
lent number of trials and errors as a 
group that received no shock. 

To summarize, in the two-choice 
discrimination learning situation the 
experimenters may punish incorrect 
responses, correct responses, both in- 
correct and correct responses, or 
neither incorrect nor correct responses. 
The rate of discrimination learning is 
typically fastest when incorrect re- 
sponses are punished, a result that re- 
quires a theory involving the correla- 
tion of response and punishment. In 
some situations punishment for correct 
responses reliably results in faster 
learning of a discrimination and in 
other situations it reliably results in 
slower learning of a discrimination. 
An important contribution would be 
made by the identification of a param- 
eter that would result in facilitation by 
punishment at some values and inhibi- 
tion by punishment at other values, 
When punishment of correct responses 
results in facilitation of a discrimina- 
tion, it is probable that the facilitation 
is not because the punishment is se- 
lectively paired with the correct re- 
sponse but in spite of it. In no case 
has it been demonstrated that punish- 
ment for all correct responses leads to 
faster learning than punishment for 
correct and incorrect responses. 


PUNISHMENT oF NEGATIVE IN- 
STRUMENTAL RESPONSES 


Punishment of a negative instru- 
mental response reinstates one of the 
training conditions, and thus may serve 
to increase the strength of a response. 
Bandura (1962) in his discussion of 
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the treatment of a boy with a severe 
behavior disorder described by Board- 
man (1962), observed that punish- 
ment might augment the undesirable 
behavior by generating hostile feelings 
similar to those that may have caused 
the original behavior disorder. Mowrer 
(1944, 1947) observed that if the sub- 
jects learn to make a particular instru- 
mental avoidance act when in a state of 
anxiety, punishing that act may 
strengthen it by increasing the anxiety. 
In support of this hypothesis, Mowrer 
cited some observations of Judson 
Brown that a rat, that had learned to 
avoid shock by running during a 10- 
second CS-US interval, continued to 
respond indefinitely even after it was 
punished only for making the avoid- 
ance response. This is one of the most 
important and fascinating of the para- 
doxical effects of punishment, and it 
suggests that punishment of negative 
instrumental responses may produce 
considerably different results than pun- 
ishment of positive instrumental re- 
sponses. 


Punishment of Escape Responses 


The first major test of Mowrer’s hy- 
pothesis of the paradoxical effect of 
punishment on acts motivated by fear 
was an experiment by Gwinn (1949). 
Gwinn trained rats to run around a 
circular alley for 18 escape trials and 
then compared several punishment-ex- 
tinction procedures with a regular-ex- 
tinction procedure. The results indi- 


_ cated that resistance to extinction was 


greater under conditions of punish- 
ment-extinction than under conditions 
of regular-extinction, and that it was 
higher with a more intense punishing 
shock than with a lower shock. Al- 
though subjects punished for escape 
responses did not continue to run in- 
definitely, they did take more trials to 
reach a criterion of extinction than 
subjects not punished for their escape 
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responses. Brown, Martin, and Mor- 
row (1962) have presented further 
evidence in support of the hypothesis 
that resistance to extinction of an es- 
cape response is greater under condi- 
tions of punishment-extinction than 
under conditions of regular-extinction. 

Several studies have not found pun- 
ishment of escape responses to increase 
resistance to extinction (Moyer, 1957; 
Seward & Raskin, 1960, Experiment 
IV). Moyer (1957) trained 18 rats 
to escape from shock by running to a 
goal box in an alley on 10 escape trials. 
The mean number of trials to a 2-min- 
ute extinction criterion was similar 
under conditions of punishment-extinc- 
tion, and regular extinction, although 
the variance of the number of extinc- 
tion responses was greater in the pun- 
ished group than in the regular ex- 
tinction group. Furthermore, Moyer 
(1957) presented evidence that “the 
shock group extinguished quite sud- 
denly, whereas the nonshock group 
gradually approached the 2-min. cri- 
terion.” 

Seward and Raskin (1960, Experi- 
ment IV) trained 45 rats to escape 
from shock by running to a goal box 
in an alley for 20 trials. A group re- 
ceiving a punishing shock on every 
extinction trial met a 10-second crite- 
rion of èxtinction in fewer trials than 
a group under regular extinction. 
Seward and Raskin (1960, Experi- 
ment IV) note that “The shocked rats 
appeared to meet the criterion sud- 
denly, i.e., they ran fast or not at all. 
Control Ss, on the other hand, slowed 
down progressively.” 

To review the results of the above 
four studies: two gave evidence that 
punishment of escape responses in- 
creases resistance to extinction (Brown, 
Martin, & Morrow, 1962; Gwinn, 
1949); one presented evidence that 
punishment of escape responses does 
not affect resistance to extinction 
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(Moyer, 1957) ; and one gave evidence 
that punishment of escape responses 
decreases resistance to extinction 
(Seward & Raskin, 1960). Our un- 
derstanding of this phenomenon would 
be greatly increased if it were possible 
to obtain all three results in a single 
experiment by varying some param- 
eter. Although the punishment-extinc- 
tion procedure does not reliably in- 
crease or decrease the number of trials 
to extinction, it does reliably change 
the course of extinction. Under the 
regular-extinction procedure, the proc- 
ess is gradual; under the punishment- 
extinction procedure it is abrupt. 


Punishment of Avoidance Responses 


There have been a number of studies 
on the effect of punishment of avoid- 
ance acts on resistance to extinction. 
As previously noted, Mowrer (1947) 
cited the observations of Brown in 
which “fight from this area continues 
to occur indefinitely.” In 1946-47, 
Whiteis (1955) obtained evidence sup- 
portive of the hypothesis that punish- 
ment of avoidance acts serves to in- 
crease resistance to extinction. Al- 
though the details of the experiment 
have not been published, the experi- 
ment is cited by Mowrer (1947) and 
Whiteis (1956). The apparatus was 
an alley and subjects, 12 rats, were 
given 50 trials of avoidance training 
with a CS-US interval of 10 seconds. 
Six subjects were then subjected to a 
punishment-extinction procedure in 
which they would receive shock only 
if they made the response; the other 
six rats were given a regular-extinc- 
tion procedure. Ten trials a day were 
run, and the criterion of extinction 
was a single trial in which the subject 
spent more than 120 seconds before 
entering, the goal box, or 250 trials, 
whichever came first. The subjects 
under the punishment-extinction pro- 
cedure showed an immediate decrease 
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in mean response latency upon the in- 
troduction of the procedure. Four of 
these subjects continued to run at a 
speed of about 4 feet per second for 
250 trials. The two subjects that met 
the extinction criterion did so abruptly, 
i.e., a rapid run on one trial was fol- 
lowed by a 2-minute wait on the next 
trial. Four of the six subjects under 
the regular extinction procedure met 
the criterion of extinction, and all of 
them showed a gradual increase in re- 
sponse latency. On Trial 80, for ex- 
ample, the regular-extinction subjects 
had a mean response time of 28.2 sec- 
onds; the punishment-extinction sub- 
jects had a mean response time of 1.0 
second (p = .01). 

Seward and Raskin (1960, Experi- 
ment V) investigated the effect of 
punishment on an avoidance response, 
repeating all other details of their ex- 
periment on the effect of punishment 
On an escape response. The subjects 
in groups punished for evey response, 
punished on half the responses, and 
not punished during extinction were 
indistinguishable with respect to num- 
ber of trials to extinction. 

In the course of their work on 
traumatic avoidance learning in dogs 
in a shuttle box, Solomon and his as- 
sociates frequently employed a shock 
extinction procedure (Brush, 1957; 
Brush, Brush, & Solomon, 1955; 
Church & Solomon, 1956; Solomon, 
Kamin, & Wynne, 1953; Wynne & 
Solomon, 1955). In these experi- 
ments a dog would be trained to per- 
form an instrumental avoidance act 
of jumping over a barrier to a cri- 
terion of avoidance, given a fixed num- 
ber of regular extinction trials, and 
then given a punishment-extinction 
procedure in which the dog would re- 
ceive shock of 3-second duration only 
if it made the instrumental response 
to the signal. The results indicated 
a remarkably high resistance to extinc- 


Russett M. CHURCH 


tion under conditions of punishment. 
In the first experiment (Solomon, 
Kamin, & Wynne, 1953) only one of 
seven dogs previously given 200 regu- 
lar-extinction trials and only two of six 
dogs given 10 regular-extinction trials 
extinguished with 100 trials of pun- 
ishment-extinction. In the later ex- 
periments the punishment procedure 
was slightly more effective. The most 
extensive study of the punishment- 
extinction procedure in the shuttle box 
with dogs was performed by Brush 
(1957). He found that 73% of the 
25 dogs given 10 trials of regular-ex- 
tinction met the criterion in 100 trials 
but only 36% of the 25 dogs given 200 
trials of regular-extinction met the cri- 
terion in 100 trials of punishment-ex- 
tinction. The punishment-extinction 
procedure was not regarded by Solo- 
mon and his associates as an efficient 
method of eliminating an instrumental 
avoidance response, although it was 
certainly more efficient than the regu- 
lar-extinction procedure. The behav- 
ior of subjects that continued to re- 
spond during punishment-extinction 
was interesting. “These dogs jumped 
faster and more vigorously into the 
shock than they had jumped previously 
under the ordinary extinction proce- 
dure [Solomon, Kamin, & Wynne, 
1953, p. 295].” Their anticipatory 
responses, especially vocalization, 
clearly indicated that they expected 
the punishment. Nonetheless, many 
dogs continued to make the response 
during 100 trials of punishment-ex- 
tinction. 

Appel (1960) trained monkeys to 
postpone the occurrence of a shock for 
20 seconds by making a lever response, 
and then alternated periods of such 
avoidance training in the presence of 
one stimulus with periods of punish- 
ment-extinction in the presence of an- 
other stimulus. During punishment- 
extinction only the first response after 
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a mean interval of 6 minutes was fol- 
lowed by a 500-millisecond shock. 
Punishment of the avoidance response 
resulted in an initial period of in- 
creased response rate before eventual 
suppression of the response. Also 
using a free-responding procedure, 
Black and Morse (1961) trained dogs 
to postpone the occurrence of a shock 
by making a response of jumping over 
a barrier in a shuttle box. Again, a 
punishment-extinction procedure typi- 
cally resulted in an initial period of 
increased response rate, followed by 
eventual suppression. Three other 
studies, however, present evidence 
that the punishment-extinction pro- 
cedure may produce more rapid ex- 
tinction than the regular extinction 
procedure. Moyer (1955) found this 
result with rats in a straight alley, 
and Kamin (1959) and Imada (1959) 
found this result with rats in a shuttle- 
box. Thus, as in the case of punish- 
ment of escape responses, there is evi- 
dence that punishment-extinction is 
more effective, equally effective, and 
less effective than regular-extinction. 
No study has yet found all three ef- 
fects as a result of variation of a single 
parameter, but a considerable contri- 
bution would be made by such a study. 

Some theoretical considerations. In 
most experiments in which punishment 
of negative instrumental responses has 
resulted in facilitation of the response, 
the punishment reinstated a condition 
present earlier in training, it elicited 
fear, the punishment elicited skeletal 
responses that were similar to the pun- 
ished response, and the termination of 
the punishment may have coincided 
with responses similar to the punished 
response. Any of the theoretical mech- 
anisms we have examined previously, 
except the suppression hypothesis and 
the avoidance hypothesis, may be 
adapted to account for these paradoxi- 
cal findings of apparently “masochistic” 


behavior. Carlsmith (1961), however, 
has described a procedure, involving 
qualitatively different aversive stimuli, 
which may serve to distinguish among 
the alternative hypotheses. Half the 
rats learned an avoidance response in 
a straight alley with an electric shock 
as the aversive stimulus; the remain- 
ing subjects learned the avoidance re- 
sponse with a loud horn as the aver- 
sive stimulus. Half the subjects in 
each of these groups was extinguished 
with the same punishing stimulus that 
was used in the original avoidance 
training; the remaining subjects were 
extinguished using the punishing stim- 
ulus that was not used in the original 
training. Carlsmith (1961) found that 
the mean number of trials to a cri- 
terion of extinction was uninfluenced 
by the conditions of original training 
or by the conditions of punishment, 
but that there was a large and signifi- 
cant interaction effect. If the same 
aversive stimulus was used as a pun- 
ishment that was used as the uncondi- 
tioned stimulus for avoidance training, 
resistance to extinction was much 
greater than if a different aversive 
stimulus was used as a punishment. 
Although control groups with a regu- 
lar-extinction procedure were not used 
to determine whether the punishing 
stimulus resulted in actual absolute 
facilitation, the results strongly sug- 
gested the discrimination hypothesis, 
i.e., that facilitation may occur in the 
case of punishment of negative instru- 
mental acts because of the reinstate- 
ment of specific stimuli present earlier 
in the training. There is no evidence 
that the fact that the aversive stimulus 
is contingent upon the response in- 
creases the response facilitation. On 
the contrary, the response-dependence 
of the aversive stimulus is probably 
the factor responsible for the eventual 
suppression of the response typically 
observed in these experiments. 
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Factors affecting punishment-extinc- 
tion, There are a number of factors 
which may be important in determin- 
ing the relative resistance to extinction 
under conditions of regular extinction 
and punishment-extinction. 

1. Resistance to punishment-extinc- 
tion may rise more rapidly than the 
resistance to regular extinction as a 
function of an increase in the strength 
of the original response. Brush (1957) 
found greater resistance to punish- 
ment-extinction after 200 trials of reg- 
ular extinction than after 10 trials of 
regular extinction; Moyer (1955, Ex- 
periment I) found greater resistance 
to extinction after 50 or 110 trials of 
avoidance training than after 10 trials 
of avoidance training. Black and 
Morse (1961) also found that the 
greater the length of previous avoid- 
ance training, the longer it took for 
the punishment-extinction procedure 
to produce eventual suppression of 
avoidance responding. 

2. The presence of an external cue 
at the site of punishment will decrease 
the resistance to punishment-extinc- 
tion. Moyer (1955, Experiment III) 
found that the placement of an addi- 
tional cue at the locus of the punish- 
ment decreased the number of trials 
to extinction. Whiteis (1956, Experi- 
ment II) also emphasized the im- 
portance of the cue at the point of 
punishment. 

3. The intertrial and intersession in- 
tervals may be important, and they 
have varied widely in various experi- 
ments. Although the intersession in- 
terval has been shown to be important 
in avoidance learning (Kamin, 1957), 
the only punishment study in which 
the interval between acquisition and 
punishment-extinction was varied has 
been performed by Moyer (1955, Ex- 
periment IT). He found that respond- 
ing was faster after a 1-day interval 
than after 7, 15, 30, or 60 days. 
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4. The intensity of the punishment, 
undoubtedly, plays an important role 
in the effect of the punishment-extinc- 
tion procedure on resistance to extinc- 
tion, but the relationship is uncertain. 
Seward and Raskin (1960) suggested 
that the shock intensity they used for 
punishment, 190 volts through 150,000 
ohms resistance, may be too great to 
obtain a facilitation effect whereas the 
milder shocks of Gwinn, 60 volts and 
120 volts through 250,000 ohms, might 
produce facilitation. Available data 
on variations of punishment intensity 
on negative instrumental responses, 
however, suggest greater facilitation as 
a function of increase in shock inten- 
sity (Gwinn, 1949; Imada, 1959). 
Using five levels of punishment inten- 
sity and a regular-extinction control 
group, Imada (1959) found that 
weaker shocks appeared to suppress 
responding more than stronger shocks, 
although response speed and number 
of responses to extinction were less 
under conditions of punishment-extinc- 
tion than under conditions of regular- 
extinction. 

5. Finally, arbitrary and trivial as it 
may seem, the criterion of extinction 
is not to be ignored with respect to the 
resistance to extinction of escape and 
avoidance acts that have been punished. 
In some experiments the subject is 
said to have become extinguished if it 
fails to make the response in 120 
seconds on 10 consecutive opportuni- 
ties (Solomon, Kamin, & Wynne, 
1953). In other experiments the sub- 
ject is said to have extinguished if it 
fails to make the response in 10 sec- 
onds on one trial (Gwinn, 1949). This 
is a difference that can make a differ- 
ence since it is commonly reported that 
punishment produces an abrupt transi- 
tion from rapid responding to nonre- 
sponding whereas regular extinction 
typically produces a gradual decrease 
in speed of response (Brush, 1957; 
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Moyer, 1957; Seward 


1960). 


& Raskin, 


Comparison of Punishment-Training 
with Regular Training 


Most studies involving the punish- 
ment of escape and avoidance acts 
have compared the punishment-extinc- 
tion procedure with the regular-ex- 
tinction procedure. In contrast, most 
studies involving the punishment of 
positive instrumental acts have com- 
pared the punishment-training pro- 
cedure to the regular-training pro- 
cedure. A few studies have made this 
comparison in the case of negative in- 
strumental responses. Church and 
Solomon (1956) found that punish- 
ment-training of escape responses of 
dogs in a shuttle box produced response 
suppression; Whiteis (1955, Experi- 
ment II) found that punishment-train- 
ing of avoidance responses of rats 
produced facilitation of response speed 
and a decrease in resistance to regular 
extinction; Shepard (1963), on the 
other hand, found that punishment- 
training of avoidance responses of rats 
produced a suppression of response 
speed, but it did not affect resistance 
to regular extinction. Although it is 
feasible to employ the punishment- 
training procedure in the case of nega- 
tive instrumental acts, the parameters 
are unexplored. 


Punishment of Responses during Se- 
lective Learning 


Maier and his associates have ob- 
tained considerable evidence consistent 
with the hypothesis that punishment of 
responses of a rat working on an in- 
soluable problem results in an abnor- 
mal fixation of the response (Maier, 
1949), Most of the experiments were 
performed with rats working on two- 
choice discrimination problems in the 
Lashley jumping apparatus. If the 
subject makes the correct choice it 
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jumps from the stand, hits a card 
with its nose, and gains entry to a 
platform on which it may cat; if the 
subject makes an incorrect choice it 
jumps from the stand, bumps its nose 
when it strikes a card that is securely 
latched, and falls into a net. A rat may 
be trained to respond reliably either to 
a position (left or right) or to a symbol 
on the card (e.g., white circle on black 
background or black circle on white 
background). If the subject is con- 
fronted with an insoluble problem (i.e., 
the two cards are latched at random, 
without respect to position or symbol) 
most subjects refuse to jump. If they 
are forced to jump after 30 seconds by 
the administration of a blast of air, an 
electric shock, or a prod with a stick, 
they typically form strong position 
stereotypes. A number of studies have 
compared the performances of experi- 
mental subjects that are trained on an 
insoluble problem and control subjects 
that are trained to a given position 
(Klee, 1944; Maier, Glaser, & Klee, 
1940; Maier & Klee, 1943; Maier & 
Klee, 1945). Although subjects in 
both of these groups have the same 
level of performance toward the end 
of training, i.e., they are both respond- 
ing reliably to a given position, the 
experimental group is much less likely 
to learn a later response on the basis of 
reward. This difference becomes 
larger as the number of days of train- 
ing are increased (Maier & Feldman, 
1948). 

There are several possible bases for 
the higher resistance to change of the 
experimental subjects with the insolu- 
ble problem and the control subjects 
with the learned position response. 
First, the experimental group is pun- 
ished on a random half of its trials 
whereas the control group is not pun- 
ished after it has learned the position 
response. Second, unlike the control 


group, the experimental group is 
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avoiding or escaping from a noxious 
stimulus on the platform. Third, the 
experimental group is rewarded on a 
random half of the trials; the control 
group is rewarded on all trials and, 
finally, the experimental subjects make 
more abortive jumps to the cards than 
the control subjects. Why is the re- 
sistance to change greater among the 
experimental subjects than among the 
control subjects? Avoidance re- 
sponses may be more resistant to ex- 
tinction than positive instrumental 
responses (Lichtenstein, 1957). The 
abortive responses of the experimental 
subjects may retard later learning 
(Wilcoxon, 1952) and, finally, par- 
tially reinforced responses may be 
more resistant to extinction than 
100% reinforced responses ( Wilcoxon, 
1952). Maier and Ellen (1951) and 
Maier (1956) have argued effectively 
that the alternative explanations have 
not been demonstrated as necessary for 
the observed effects and that they are 
insufficient to account for some of the 
observations that have been made. 
Nonetheless, the evidence at the pres- 
ent time is insufficient to demonstrate 
that the punishment involved in the 
treatment of subjects in an insoluble 
problem is involved in their high re- 
sistance to change. Clearly, further 
work is required with partial rein- 
forcement controlled and levels of 
punishment sufficiently low that no 
noxious stimulation need be applied in 
the starting platform. Under such 
conditions the effects of punishment 
of a response on resistance to extinc- 
tion could be assessed. 

Farber (1948) also obtained results 
which led him to conclude, “There 
can be little doubt that shock, as com- 
pared with the absence of shock, was 
effective in fixating the original re- 
sponse.” He trained four groups of 
rats to go to their preferred side of a 
T maze for 40 trials. The subjects 
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in the experimental group were then 
shocked immediately after their choice 
responses ; subjects in the control group 
were not shocked. Following this pun- 
ishment-training or regular-training 
procedure, all subjects were given test 
trials involving reward for a response 
opposite their original preference, The 
subjects in the control group that had 
previously received regular-training 
learned the new response more readily 
than subjects in the experimental group 
that had previously received punish- 
ment-training. As in the case of the 
experiments by Maier and his associ- 
ates, the interpretation of this result 
is complicated by the fact that it was 
necessary to administer an occasional 
shock to experimental subjects in the 
stem of the T maze to force them to 
run through the punishing shock. This 
procedure resulted in experimental and 
control subjects that were both moving 
to their preferred side of the T maze 
for positive reward at about the same 
speed, but the experimental subjects 
were also escaping or avoiding a shock 
in the stem and being punished with a 
shock after the choice point. When 
reversed, the experimental subjects 
were slower to change their response 
than the control subjects. This may 
have been because the experimental 
subjects had previously been punished 
for a response, but it also may have 
been because their escape or avoidance 
responses were more resistant to ex- 
tinction than the control subjects’ ap- 
proach responses, Further work on 
this problem should certainly be done 
with equivalent treatment of experi- 
mental and control subjects in the stem 
of the maze. 


AN EVALUATION oF THE EFFECTIVE- 
NESS OF PUNISHMENT 


Should punishment be used to re- 
duce the strength of a response? Some 
psychologists have opposed the use of 
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punishment on the grounds that (a) 
it is less effective than some of the 
alternatives, (b) it produces unde- 
sirable éffects other than the reduction 
of the strength of the response, and 
(c) it is unkind to the individual. We 
will quickly pass over the moral ob- 
jections after noting that, usually, the 
choice is not simply between reward 
and punishment. Whenever the al- 
ternative to punishment involves de- 
privation or extinction, the relative 
moral values are difficult to assess. 

Despite the objections to punish- 
ment, parents do punish their children, 
and most parents use physical punish- 
ment at least on some occasions. 
Sears, Maccoby, and Levin (1957) 
carried out a major interview study 
of the child rearing practices of moth- 
ers of 379 kindergarten children in 
two Massachusetts communities. They 
found that, although there were enor- 
mous differences among the respon- 
dents in the frequency and severity of 
physical punishment that they used 
on their children, 99% of the parents 
reported that they had spanked their 
children at least once. Curiously 
enough, in answer to the question, 
“How much good does it do to 
spank?,” only about half of the mothers 
were basically affirmative. Why 
should a parent spank his child if he 
believes it will do no good? It may 
be that his beliefs are more affected 
than his behavior by the cultural norms 
against punishment, or it may be that 
his punishment really is ineffective 
because it is not properly applied or 
because punishment per se is ineffec- 
tive. Some parents admit that they 
punished their children partly because 
they were angry. 

Two major alternatives to punish- 
ment for the reduction of the strength 
of a response are extinction of the 
response, and counterconditioning (ex- 
tinction of the response and reinforce- 


ment of an incompatible response). 
Both of these procedures require the 
identification of the source of reinforce- 
ment for the original response and the 
ability to eliminate that reinforcement, 
requirements often difficult to meet out- 
side of the laboratory. In the cases 
where the source of reinforcement can 
be identified and eliminated, evidence 
from animal investigations suggests 
that the addition of punishment will 
increase the speed of elimination of 
the response. Special precautions must 
be taken with the counterconditioning 
procedures so that the subject will not 
repeat the act scheduled for elimina- 
tion in order to get a countercondition- 
ing trial with positive reinforcement. 

Sears, Maccoby, and Levin (1957) 
state that punishment may be effective 
if it is combined with positive reward 
for some alternative response, but that 
by itself punishment has only a tem- 
porary effect. In some cases punish- 
ment serves to suppress behavior only 
as long as the punishment is applied ; 
as soon as the punishment is stopped, 
behavior returns to its former state 
(Azrin, 1960). Ideally, parents seek 
a technique to eliminate undesired be- 
havior that will last for a long period 
of time, even in the absence of the 
punishing agents. Whiting and Child 
(1953) have proposed that techniques 
of punishment involving a loss of love 
may be more effective than techniques 
invloving physical punishment for the 
production of guilt. | Nonetheless, 
strictly physical punishment, particu- 
larly if it is severe, can produce highly 
persistent response suppression. We 
have described cases of punishment in 
which the subject completely sup- 
pressed a previously learned instru- 
mental response until it died of starva- 
tion (Klee, 1944). 

Some unfortunate side effects of 
punishment are described by Sears, 
Maccoby, and Levin ( 1957), Whena 
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parent punishes a child he gives him 
a model for aggressive behavior, and 
the child may come to hate or fear the 
parent. They found that children of 
parents that used a considerable amount 
of physical punishment generally had 
more behavioral problems, particularly 
in the area of aggression, than did 
children of parents that used less 
physical punishment. Similar results 
have been reported by others (Ban- 
dura & Walters, 1959; Glueck & 
Glueck, 1950). There are various in- 
terpretations of this type of correlation, 
It may- be that punishment produces 
behavioral problems, that children with 
behavioral problems are more often 
punished than normal children, or that, 
for some reason, the technique of pun- 
ishment is currently practiced by the 
wrong sorts of parents, those charac- 
terized by problem children. Faced 
with such a multiplicity of explana- 
tions, and with no hope of disen- 
tangling them by means of the usual 
random assignment of subjects to 
treatments, we have not described 
these data in detail but, instead, have 
relied heavily upon the evidence from 
animal experimentation, Hopefully, 
there is sufficient phylogenetic con- 
tinuity that the understanding of the 
effects of punishment of animals will 
contribute to our understanding of the 
effects of punishment on children. 


A SYNTHESIS 


Experiments on the effect of pun- 
ishment on behavior have found condi- 
tions under which punishment reliably 
produces total suppression, partial sup- 
pression, temporary suppression, and 
even facilitation of the punished re- 
sponse. With such a variety of ef- 
fects, any attempt at synthesis may be 
doomed to fail, Nevertheless, an ex- 
amination of the data suggests the fol- 
lowing generalization: The amount of 
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response suppression is greater, or the 
amount of response facilitation is less, 
when the noxious stimulus is contin- 
gent upon the occurrence of the re- 
Sponse (the punishment procedure) 
than when the noxious stimulus is 
contingent upon the discriminative 
stimuli. 

Most of the data from experiments 
directly relevant to this generalization 
are consistent with the statement: (a) 
When the noxious stimulus is con- 
tingent upon the response the amount 
of response suppression is greater than 
when the noxious stimulus is admin- 
istered independently of the response, 
(b) the amount of response suppres- 
sion is inversely related to the time 
between response and punishment and 
it is inversely related to the interval 
between stimulus and punishment, (c) 
selective punishment of a quantitative 
or a qualitative characteristic of a re- 
sponse results in selective suppression 
of that characteristic, and (d) when 
punishment of an avoidance response 
results in response facilitation, the 
magnitude and duration of the effect is 
typically less than that obtained under 
conditions of noncontingent aversive 
stimulation. Thus an experimental 
subject that is punished may be com- 
pared with a control subject that re- 
ceives the same aversive stimuli de- 
pendent upon the discriminative stim- 
uli, not upon its responses, or a control 
subject that receives no aversive stimu- 
lation. When compared with this 
latter treatment, the effects of punish- 
ment are varied, but when compared 
with the former treatment, the invari- 
able result of punishment is response 
suppression. 

Two of the theoretical mechanisms 
that we have described are specifically 
designed to account for the empirical 
generalization that has been proposed, 
the suppression hypothesis and the 
avoidance hypothesis. In the case of 
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the suppression hypothesis, some form 
of inhibition of the response is postu- 
lated on trials in which the response 
does occur; in the case of the avoid- 
ance hypothesis, some form of rein- 
forcement for nonresponse is postu- 
lated on trials in which the response 
does not occur. Are there any dif- 
ferential consequences of these two 
statements? Response stereotypy on 
those trials in which the punished re- 
sponse does not occur would suggest 
that there was reinforcement for some 
specific nonresponse (avoidance hy- 
pothesis). An immediate change in 
performance as a result of a change of 
intensity of the noxious stimulus would 
suggest that there was inhibition from 
punishment (suppression hypothesis). 
Finally, an investigation of the se- 
quence of punished and unpunished 
trials for a number of presumably ho- 
mogeneous subjects during learning, 
or for a single subject at asymptotic 
performance, would give evidence re- 
garding the relative importance of 
punishment and nonpunishment in de- 
termination of the behavior (Bush & 
Mosteller, 1955, pp. 237-258). Of 
course, it is definitely possible that 
both factors are involved. To date 
there has been no empirical attempt 
to test the differential implications of 
the suppression hypothesis and the 
avoidance hypothesis, so that the choice 
between them is a matter of taste. 
Dinsmoor (1954) has argued effec- 
tively for the avoidance hypothesis on 
the grounds that it does not make any 
new assumptions, i.e., it does not make 
any assumptions that are not typically 
made in the explanation of avoidance 
learning. Nonetheless, for the ex- 
planation of punishment effects alone, 
the suppression hypothesis is the 
simplest, and no data are available to 
favor the avoidance hypothesis over the 
suppression hypothesis. 

Our empirical generalization may be 
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satisfactory for those situations in 
which punishment produces a suppres- 
sion of the response, but how is it 
possible to use such a generalization 
to account for those situations in 
which punishment produces response 
facilitation? The answer may be that 
response facilitation occurs in those 
instances not because the response was 
punished, but in spite of the fact that 
it was punished. Our generalization 
merely asserts that there will be more 
suppression (or less facilitation) if the 
noxious stimulus is contingent upon 
the response than if it is contingent 
upon the discriminative stimuli. For 
example, subjects that are punished 
for each avoidance response may take 
more trials than subjects under con- 
ditions of regular-extinction to reach 
a criterion of extinction. Our empiri- 
cal generalization, however, leads us 
to expect that subjects that receive a 
shock of equal intensity and duration 
at the onset of the discriminative stim- 
ulus should have an even higher re- 
sistance to extinction. Unfortunately, 
this application is hightly speculative 
since most situations in which punish- 
ment has produced response facilita- 
tion have involved only a comparison 
of subjects that are punished with 
those that are not. 

The remaining problem is to under- 
stand the difference in behavior be- 
tween an experimental subject that re- 
ceives punishment after a response and 
a control subject that receives no aver- 
sive stimulation. Under what condi- 
tions does such aversive stimulation 
produce response facilitation, and un- 
der what conditions does it produce 
response suppression? (Because it 
avoids the suppressive effect of re- 
sponse-contingent punishment, it might 
be more fruitful to consider the differ- 
ence in behavior between a control 
subject that receives aversive stimu- 
lation contingent upon the discrimina- 
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tive stimulus and a control subject that 
receives no aversive stimulation, but 
this would be outside the scope of a 
treatment on punishment.) 

In those cases in which punishment 
of a response results in facilitation of 
the response, various explanations are 
usually available. It may be (a) that 
the punishment reinstated one of the 
conditions of training (discrimination 
hypothesis), (b) that the aversive 
stimulus elicited fear which facilitated 
the response (fear hypothesis), (c) 
that the aversive stimulus elicited 
skeletal acts compatible with the 
punished act (competing response hy- 
pothesis), or (d) that the response 
associated with the termination of the 
aversive stimulus was compatible with 
the punished act (escape hypothesis). 

Unfortunately, there are few ex- 
periments explicitly designed to dem- 
onstrate the necessity of any one of 
these hypotheses. Typically, an in- 
vestigator has used one of these theo- 
retical mechanisms to account for ob- 
served facilitation, but the alternatives 
were certainly not excluded. The 
relevance of the competing response 
hypothesis is particularly difficult to 
demonstrate since the response itself 
is difficult to manipulate, and the speci- 
fication of “incompatibility” is uncer- 
tain. However, punishment of re- 
sponses elicited by an aversive 
stimulus, such as crying, heart-rate in- 
crease, and urination, may result in 
substantial facilitation of the response 
before any suppression is obtained. 

The escape hypothesis is directly 
testable since any response may be 
required by the experimenter to termi- 
nate the punishment. The assumption 
of the escape hypothesis is that there 
will be less suppression of the response 
if the escape response is similar to the 
punished response than if it is grossly 
different. There are many cases in 
which punishment of a negative instru- 
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mental response has resulted in facili- 
tation of the punished response when 
the latter is similar to the response 
that terminated the aversive stimula- 
tion. Unless support for the escape 
hypothesis can also be found in the 
case of punishment of positive instru- 
mental responses, however, the ap- 
parent support for the escape hypothe- 
sis may occur only because punishment 
reinstated a condition of original 
training. 

Some variation of the fear hypothe- 
sis is certainly necessary to account for 
the response decrement usually ob- 
served when the subject has received 
noxious stimulation in the presence of 
discriminative stimuli. It is less certain 
whether or not such a fear hypothesis 
is required to account for the response 
facilitation sometimes observed when 
the subject is punished for a negative 
instrumental response. Apparently, 
more facilitation occurs if the specific 
aversive stimuli that aroused the fear 
in original learning are used to pun- 
ish than if different aversive stimuli 
are used. Thus the fear hypothesis 
becomes a variation of the discrimina- 
tion hypothesis, i.e., punishment may 
facilitate a response by reinstating a 
condition of training. The discrimina- 
tion hypothesis, when specifically 
tested, has been shown to be a useful 
idea. 

In comparison with a procedure in- 
volving no aversive stimulation, the 
effects of punishment are varied. If 
punishment reinstates a condition of 
original training, or if it elicits a re- 
sponse similar to the act that is being 
punished, then the procedure may pro- 
duce response facilitation. Otherwise, 
punishment will produce response sup- 
pression. In comparison with aver- 
sive stimulation contingent upon the 
discriminative stimuli, however, the 
effect of punishment is simple. It 
always produces suppression. 
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SIMPLE CONDITIONING AS TWO-STAGE 
ALL-OR-NONE LEARNING! 


JOHN THEIOS 


University of Texas 


A theoretical model which postulates learning to take place in 2 dis- 
crete steps is developed and applied to avoidance conditioning. Ac- 
cording to the model, conditioning is interpreted as an absorbing 
Markov process with three distinct states of conditioning. Among other 
things, the model predicts that the probability of a successful condi- 
tioned response should be constant on trials between the first success 
and the last error. Data are reported which support the constancy 
prediction as well as the model's other quantitative predictions. The 
model was generalized to apply to other learning situations, and 
relevant data were summarized. It would be quite difficult for a linear 
operator or habit-strength model to account for these data which 


support the constancy prediction. 


In many learning situations, the re- 
sponse under study has initially a zero 
probability of occurring, but as the 
experiment progresses, the response 
probability approaches an asymptote 
of unity. A few situations of this type 
are instrumental avoidance condition- 
ing, classical defense and appetitive 
conditioning, and reversal learning. 
These situations will be referred to as 
simple conditioning. In the past, 
simple conditioning has been analyzed 
in terms of response strength or linear 
operator models (Bush & Mosteller, 
1955; Estes, 1950; Hull, 1943), which 
assume that the strength or prob- 
ability of a learned response increases 
gradually during the course of learn- 
ing. Recently, it has been found that 
Markov models, which assume that 
learning takes place on single trials in 


1I am indebted to Gordon H. Bower for 
deriving a number of the theoretical predic- 
tions and for his valuable advice during the 
development of this paper. Appreciation is 
also expressed to Patrick Suppes, William K. 
Estes, Frank Restle, and Robert R. Bush for 
their interest and encouragement. Much of 
this research was conducted during the 
author’s tenure on a National Science Founda- 
tion Cooperative Graduate Fellowship at 
Stanford University during the academic year 
1960-61. 


an all-or-none fashion, more ade- 
quately describe some types of verbal 
learning than do the linear models 
(Bower, 1962; Estes, 1960). It is 
quite possible that simple conditioning 
is also characterized by some sort of 
discrete learning as opposed to gradual 
learning. This possibility is further 
enhanced by the fact that the ‘zero 
to unity” changes in response prob- 
abilities, characteristic of simple con- 
ditioning, should lend themselves 
nicely to discrete conditioning states, 
which an absorbing Markov model 
would require (cf. Suppes & Atkin- 
son, 1960). The present paper pre- 
sents a three-state absorbing Markov 
model for simple conditioning, and 
then compares the theoretical predic- 
tions to actual data collected in an 
extensive experiment on avoidance 
conditioning of rats. 


Tue Two-PATTERN STIMULUS 
SAMPLING MODEL 


The basic learning theory assumed 
in this paper is that proposed by 
Suppes and Atkinson (1960). They 
suggest that any learning situation 
can be represented by a finite set of 
stimulus patterns and that each com- 
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ponent pattern of the stimulus set is 
conditioned to exactly one response 
from a set of mutually exclusive re- 
sponses available to the subject. On 
each trial the subject samples only 
one stimulus pattern and makes that 
response to which the sampled pattern 
is conditioned. If the subject makes 
an incorrect response, reinforcement 
occurs and elicits or forces the correct 
response, which becomes conditioned 
to the sampled pattern with a fixed 
probability, c. 

In simple conditioning only one 
response is reinforced consistently on 
every trial. The particular model 
to be proposed assumes that simple 
conditioning situations can be repre- 
sented by exactly two stimulus pat- 
terns and that on any given trial each 
pattern has probability .5 of being 
sampled by the subject. The funda- 
mental axioms of the model are as 
follows: 


Identification Axioms 


11. A simple conditioning situation 
may be represented by exactly two 
stimulus patterns. 

I2. At the start of conditioning, 
neither of the two patterns is condi- 
tioned to the correct, Ai, response. 


Conditioning Axioms 


C1. A stimulus pattern is condi- 
tioned to only one response at a 
given time. 

C2. The stimulus pattern that is 
sampled on a trial becomes condi- 
tioned to the reinforced response with 
a fixed probability, c. If the pattern 
is already conditioned to the rein- 
forced response, it remains so con- 
ditioned. 

C3. The stimulus pattern that is 
not sampled on a given trial cannot 
become conditioned to the correct 
response on that trial. 

C4. The probability, c, that the 
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sampled pattern will become condi- 
tioned to the reinforced response is 
independent of the trial number and 
preceding sequence of events. 


Sampling Axioms 


S1. Exactly one pattern is sampled 
on each trial. 

S2. Each of the two patterns has 
probability .5 of being sampled on a 
given trial. 

S3. On any trial, the probability of 
sampling a given pattern is in- 
dependent of the trial number and 
preceding sequence of events. 


Response Axiom 


R1. On any trial, that response is 
made to which the sampled pattern is 
conditioned. 

According to the axioms, the learn- 
ing process in simple conditioning may 
be described as an absorbing Markov 
process with three states. At the 
start of the experiment, when neither 
of the two stimulus patterns is condi- 
tioned to the correct response, the 
process is in conditioning state So. 
After one of the patterns becomes 
conditioned to the correct response, 
the process is in conditioning state Si, 
where the probability of a correct 
response is .5. Finally, when both 
patterns are conditioned to the correct 
response, the process is in the absorb- 
ing state, Ss, where the probability of 
a correct response is unity. The trees 
of the Markov process are given in 


Figure 1. The matrix of transition 
probabilities, P, in canonical form is 
So Sy So 
Se 1 0 0 
P= fo ne D 
c c 
Sı 3 1— 3 0 
So 0 C 1—c 
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PREDICTIONS AND DATA 


A large number of predictions which 
follow from the model will be derived 
in this section. As a test of the 
adequacy of the model, the predic- 
tions will be compared to data col- 
lected in an avoidance learning ex- 
periment where 50 rats served as sub- 
jects. The apparatus was a modi- 
fied Miller-Mowrer electric shock 
box, consisting of a black and a white 
compartment separated by a guillo- 
tine door. The correct response (41) 
was to run from one compartment to 
the other within 3 seconds after a 
buzzer sounded and a light came on in 
the white compartment. Special care 
was taken to reduce the stimulus 
situation drastically so that the situa- 
tion could be represented by only 
two stimulus patterns. The reduction 
was achieved, for example, by reduc- 
ing the external distractions for the 
rat, using a high intensity shock (255 
volts), running the rats only one way 
(e.g., always black to white) rather 
than having them shuttle, and giving 
all trials in one experimental session 
at 20 second intertrial intervals. The 
procedure was to place the subject in 


Conditioning State 
trial n 
Response 


Conditioning State 
trial n+l 
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Fic. 1. The trees of the Markov process. 
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one compartment and turn on the 
buzzer and light as the door between 
the compartments was opened. If 
the subject did not run into the other 
compartment within 3 seconds, he 
was shocked until he escaped into the 
other compartment. The buzzer, 
light, and shock terminated when the 
other compartment was entered. If 
the rat’s response was an escape from 
the shock, it was designated an error 
(Ag). If the response occurred before 
the onset of the shock it was desig- 
nated a success (A;). After 20 
seconds the subject was returned to 
the first compartment, and another 
trial was given. A rat was run until 
he met a criterion of 20 consecutive 
successful avoidance responses. When 
the subject met the criterion he was 
given reversal learning (e.g., if he had 
originally learned to run from black 
to white, during reversal learning, he 
learned to run from white to black). 
Since there were no significant differ- 
ences between the data of original and 
reversal learning (e.g., mean total 
errors were 4.8 and 4.6, respectively), 
the data from the two series were 
pooled, yielding 100 response se- 
quences to test the model. 


Bernoulli Properties of the Model 


One cannot observe on what trial 
a transition from Conditioning States 
So to Sı occurs. However, if there are 
some trials between the first success 
and the last error, we can be sure that 
the subject is in State Sı on these 
trials. For surely, if the subject has 
made one success, at least one of the 
stimulus patterns is conditioned to the 
A, response; and if on a later trial 
the subject makes an error, then at 
least one of the patterns is not condi- 
tioned to the Ai response. Since un- 
conditioning does not occur in the 
present model, the above two pat- 
terns must be distinct, and, by 
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definition, the subject must be in 
conditioning State Sı. According to 
the model, the probability of a 
success in State Sı is a constant, }. 
Also, the conditional probability of a 
success on Trial 2 + 1 given a success 
on Trial n is a constant, 4, since 
conditioning cannot occur on success 
trials: 


P(A1,n41|A1,n a) Sin) = $. [2] 


Thus, according to the model, the 
sequence of responses between the 
first success and the last error should 
be an independent Bernoulli sequence 
with p =q =}, and all statistics 
relevant to coin tossing experiments 
should be applicable to the response 
sequences during these trials. 

For example, defining k as the 
length of a run of successes between 
the first success and the last error, the 
probability distribution of h will be 
equal to the expected probability 
distribution of obtaining a run of 
heads in a coin tossing experiment, 
which is 


P(h = j) = pq = G), 

For j = 1, 2,3,:--. [3] 
The mean and variance of the dis- 
tribution of h will be 


BQ) = Die = 2, [4] 


and 
Var (b) =Z G- 2G) =2 [5] 


The obtained and predicted expecta- 
tion and standard deviation of the 
mean length of runs of successes be- 
tween the first success and the last 
error are given in Table 2. 

According to the linear models 
(Bush & Mosteller, 1955; Estes, 
1950; Hull, 1943), as n increases, the 
conditional probability of a success on 
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Trial n following a success on Trial 
n — 1 should also increase. But, the 
two-pattern sampling model predicts 
that on trials between the first 
success and the last error, the condi- 
tional probability of a success on Trial 
n given a success on Trial v — 1 
should have a constant value of 4. 
The obtained conditional probabili- 
ties, given in Figure 2, are approxi- 
mating a constant value near 4, rather 
than increasing with trials as the 
linear model would predict. This 
relationship is, by far, the strongest 
evidence for the two-pattern sampling 
model. 

Another exacting test of the Ber- 
noulli property of the model is that 
the response sequences during the 
trials between the first success and 
the last error should satisfy the 
binomial distribution. To provide 
this test, the data were divided into 
blocks of four trials, and the number 
of successes in each block was counted. 
This sum can take on the values 0, 1, 
2, 3, or 4. If the model fits the data, 
the obtained frequency distribution 
should not differ significantly from 
what would be expected from per- 
forming a large number of coin tossing 
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Fic. 2. Conditional probability of a suc- 
cess on Trial m given a success on Trial n — 1 
on trials between the first success and the last 
error, P (Ar n| Ai nai N St, n1). 
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TABLE 1 


Number of Obtained 
successes frequency frequency 
0 2 3.1 
1 12 12.5 
2 17 18.8 
3 15 12.5 
4 4 3.1 


Note.—Chi square = 1.47, df = 4. 


experiments in which a coin was 
tossed four times in each experiment 
and the distribution of the number of 
heads in each experiment was tabu- 
lated. The obtained and predicted 
frequencies of successes are given in 
Table 1. A test of goodness of fit 
yielded a chi square of 1.47, which, 
with 4 degrees of freedom indicates 
that the predictions fit the obtained 
data very well. 

If sampling of the two patterns is 
random, then the outcomes of trials 
between the first success and the last 
error should be statistically indepen- 
dent (a zero-order Markov process). 
This hypothesis can be tested against 
the alternative hypothesis that the 
process is a first-order Markov chain 
by a chi square test of homogeneity 
(cf., Suppes & Atkinson, 1960). The 
obtained chi square of .07 with one 
degree of freedom indicates that we 
cannot reject the hypothesis that the 
response sequence between the first 
success and the last error is a zero- 
order Markov chain. 


Total Errors 


Suppose we let ¢; represent the num- 
ber of errors made in transient Con- 
ditioning State S;, where îi = (0, 1). 
By the axioms, t; has the geometric 
distribution given by 


P&=/=c1—97, [6] 
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with mean and variance 


1 ` (i- 
E(t)=—, Vart) = amo) [7] 
C C 
: 
We let T represent the total number 
of errors before absorption, i.e., 


T=t+h. [8] 


The variable T is the sum of two 
independent, identically distributed 
random variables. The probability 
that T takes on an arbitrary value k 
is given by the negative binomial 
distribution 


P(T=k)= 


0 for k <2 
f ’ [9] 
(k—1)e(1—c)*? for k>2 


which has mean and variance 
2 200 — 
E(I) ==, Var(1)= 202. [10] 


It should be noted from Equation 9 
that the model makes the very strong 
prediction that the number of errors 
in any learning sequence must be 
equal to or larger than two, the 
number of stimulus patterns repre- 
senting the situation. This prediction 
follows from the assumption that 
conditioning can occur only on trials 
on which an error has been made. 

The expected total errors can serve 
as a stable estimator of the model’s 
single parameter, c. In the avoidance 
experiment described above, the mean 
total errors was 4.68. Equating E(T) 
in Equation 10 to 4.68, the resulting c 
value is .427. This estimate of ¢ will 
be fixed throughout the remaining 
discussion and will be used in the 
calculations of all the following theo- 
retical predictions. 

The obtained distribution of total 
errors is given in Figure 3, along with 
the theoretical predictions. It can 


408 
30 cee DATA 
z — THEORY 
20 
> 
e 
2 
@ .10 
< 
o 
© 
a 
Eno 
01234 5 6786 9 Ol 
NUMBER OF ERRORS (k) 
Fic. 3. Probability distribution of total 


errors, P(T = k). 


be seen that the model predicts the 
data quite well. 

In deriving further predictions it is 
useful to have the probabilities, Wi, n, 
that the subject is in Conditioning 
State S;(i = 0, 1, 2) on Trial n of the 


experiment (n = 1, 2, 3,---). The 
result for wo,» is 
Won = (1 — cc), [11] 


For a subject to be in State S, on Trial 
n we note that he must have remained 
in State So for k trials (k = 1, 2,---, 
n — 1) before moving to S, and then 
have remained in State Sı for 
(n — k — 1) trials. Thus, the prob- 
ability of being in State S, on Trial n 
will be given by 


Wi,n= 3 (1 -o=e(1 — Ais 


which has the solution 


n=l 
Taka a 
[13] 
Having obtained the probabilities of 
being in conditioning States So and S, 


on Trial n, the probability of being in 
Sz may be obtained by subtraction: 


[14] 
Wd p= 1~2 (1 = 5) E 


W2,n=1—Wo,n—W1,n, 
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Once the subject has arrived at Con- 
ditioning State S there can be no 
more errors. Hence, we,, gives the 
probability of no more errors after 
Trialm — 1. The observed and theo- 
retical proportions of response se- 
quences having no errors following 
Trial m — 1aregiven in Figure 4. The 
c value used in the predictions is .427, 
which was estimated from the mean 
total errors. 


Trial Number of the Last Error 


A subject's last error can occur on 
Trial n only if he is in Conditioning 
State Sı, samples the unconditioned 
pattern, and conditioning of that 
pattern is effective. Thus, the prob- 
ability distribution of the trial number 
of the last error, P(Z = n) for n = 1, 
2, 3,---, is 


which has mean and variance 
(5—3c) 


E(L)= z, Var (L) = [16] 
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errors following Trial n — 1, (ws n). (These 
curves also represent the cumulant of the dis- 
tribution of the trial number of the last error.) 
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The obtained and predicted ļmeans 
and standard deviations of the trial 
number of the last error are given in 
Table 2. The cumulant of the dis- 
tribution of the trial number of the 
last error is just Wz,„, the probability 
of no errors following Trial » — +; 
which was given in Figure 4. 

The average probability of an 
error on Trial n will be equal to the 
probability of being in State So on 
Trial n times the probability of an 
error in So plus the probability of 
being in State Sı on trial 1 times the 
probability of an error in Sı. Thus, 


P(A2,n) =1-Wo,n +d Win 
P(Aa,n)=(1-0)"" 


+4-2[(1-5) [17] 


a a-o |, 


The probability of a success on Trial 
n is given by 


P(A1,n) = 1 — P(A2,n), 


VE (1- 9) ETA 


The obtained and predicted mean 
learning curves are given in Figure 5. 

Although many more predictions 
follow from the model, further exposi- 
tion of specific predictions would run 
the risk of unduly burdening the 
reader with mathematical material. 
However, for those readers interested 
in using the model in further research, 
sequential statistics and other pre- 
dictions are given in the Appendix, 
with only brief explanations. The 
complete deviations of these predic- 
tions and the corresponding obtained 
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Fic. 5. Probability of an avoidance re- 
sponse, P(A;,,), on successive trials of the 
experiment. 


data can be found in a technical 
report (Theios, 1961). Sequential 
data for individual rats are presented 
in Table 4 of the Appendix for readers 
interested in checking other predic- 
tions or alternative models. 

To summarize this section on the 
two-pattern model, a number of pre- 
dictions are compared with obtained 


TABLE 2 


OBSERVED AND PREDICTED VALUES FOR 
Various RESPONSE MEASURES 


Observed | Predicted 


Response measure 
Total errors 
E(T) 4.68 
o(T) 2.34 2.48 
Trial number of last error 
E(L) 6.56 7.02 
o(L) 3.40 4,52 
Errors before the first success 
E(Jo) 2.96 3,03 
a(Jo) 1.83 2.14 
Probability of no reversals 
26 30 
Mean number of runs of errors 
2.18 218 
Runs of errors, ri, of Length j 
r 1.05 1.05 
re 46 JL 
rs 30 27 
Autocorrelation of errors, ck, k 
trials apart 
c 2.50 2.62 
c 2.06 2.06 
ce 1.47 1.61 
[Z] 1.18 1.27 
Mean length of runs of successes 
in State Sı 
E(h) 1.77 2,00 
o(h) 1.14 1.41 
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data in Table 2. In general, the pre- 
dictions approximate the obtained 
data quite well. 


General N-Pattern Models 


Any experiment in which the initial 
probability of a correct response is 
zero, but asymptotically becomes 
unity can be analyzed in terms of an 
absorbing Markov model. However, 
it would be premature to suggest that 
all learning situations of this type 
could be represented by two stimulus 
patterns, i.e, a two-stage learning 
process. It would be wiser to assume 
that the complexity of the individual 
learning situation determines the 
number of stimulus patterns or stages 
of learning. In other words, more 
stimulus patterns would be necessary 
to represent complex learning situa- 
tions than would be necessary for 
simple situations. Thus, a general 
N-pattern stimulus sampling model 
where the number of patterns (N) 


0 


which has mean and variance 
N N(i- 
E(T)=—, Var(T)= vazo, [21] 


It should be noted from Equation 20 
that the stimulus sampling models of 
the type we have been considering 
make the very strong prediction that 
the number of errors in any learning 
sequence must be equal to or larger 
than the number of stimulus patterns 
representing the situation. 

It is quite easy to obtain an estimate 
of the number of stimulus patterns 
for an N-pattern model which assumes 
all of the axioms given earlier, except 
Axioms I1 and S2. For any experi- 


FE cN(1 — c)N fork>N 


is a parameter which must be esti- 
mated from the data, would be of 
greater scientific value than the simple 
two-pattern model. 

An N-pattern model can be repre- 
sented by a Markov chain that has V 
transient states and one terminal 
absorbing state. Suppose we let £; 
represent the number of errors in 
transient State S;, for i = (0, 1, 2, 

-+, N — 1). Itcan be shown that t: 
has the geometric distribution given 
earlier in Equation 6 


P(t = j) = c(l — oH, 


We let T represent the total number 
of errors before absorption, i.e., 


T = totti +t teH ty 4, [19] 


The variable T is the sum of N 
independent, identically distributed 
random variables. The probability 
that T takes on an arbitrary value & 
is given by the negative binomial 
distribution 


fork<N 
, [20] 


a E NO 


ment in which the probability of a 
correct response increases from zero to 
unity, one can compute the mean and 
variance of the total number of errors 
before perfect learning. The obtained 
mean and variance can be set equal to 
the general theoretical equations for 
the mean and variance of the total 
errors (Equation 20), and the two 
equations can be solved simultane- 
ously for the conditioning parameter, 
c, and the number of stimulus pat- 
terns, V. For the present data on 
avoidance conditioning in rats, this 
estimate of N is 2.019, indicating that 
the two-pattern model is most ap- 
plicable for these data. 
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Solomon and* Wynne Data and the N- 
Pattern Model 


To demonstrate its generality, the 
N-pattern model will now be applied 
to the data of an experiment by 
Solomon and Wynne (1953) in which 
30 dogs learned an avoidance response. 
This application of the model is of 
great interest since Bush and Mos- 
teller (1959) have already fitted eight 
models to these data, with varying 
degrees of success. The procedure 
used by Solomon and Wynne differed 
from that of the rat avoidance study 
already considered in that they used 
the more difficult shuttle response 
which requires the subject to jump to 
a place where he has been previously 
shocked. It might be expected that 
the shuttle procedure would result in 
a stimulus situation which was more 
complex than the nonshuttle situation. 
If so, the estimate of the number of 
stimulus patterns should be greater 
for the shuttle situation. 

For the Solomon and Wynne data, 
the estimate of the conditioning 
parameter, c, is .538 and the estimated 
number of stimulus patterns, N, is 4.2. 
Rounding 4.2 to the nearest integer, 
it follows that some four-pattern 
model may describe these data better 
than Bush and Mosteller’s (1955) 
two-operator linear model which for 
so long has been assumed to give the 
best description of these data. The 
obtained probability distribution of 
total errors for the Solomon and 
Wynne data is given in Figure 6, 
along with the predictions from the 
two-pattern and four-pattern models. 
The predictions from the four-pattern 
model approximate the data very 
well, while the predictions from the 
two-pattern model are quite dis- 
crepant. It should be noted that the 
very strong prediction that the total 
number of errors for any subject 
cannot be less than the number of 


PROBABILITY (T =i) 


Fic. 6. Probability distribution of total 
errors for the Solomon and Wynne dogs, and 
the predictions from the two-pattern and 
four-pattern Markov models. 


stimulus patterns was upheld in the 
data of the dogs as it was in the data 
of the rats. 

The mean and variance of the trial 
number of the last error, L, predicted 
by the four-pattern model are 


E(L) =3- [22] 
and 
96 — 
Var(L) = sos [23] 
For a general N-pattern model, the 


expected trial number of the last 
error will be given by 


E(L|N-patterns) 


wx 4 
=i noi VA 


Because of the tedious algebra in- 
volved, other predictions for the four- 
pattern and N-pattern models are not 
yet available in closed form. 

In Table 3, the obtained means and 
standard deviations of total errors and 
trial number of the last error are 
compared with the predictions from 
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TABLE 3 


A COMPARISON OF AVOIDANCE CONDITIONING 
Data or REAL Docs WITH LINEAR 
Srat-Docs AND FOUR-PATTERN 
MARKOV PREDICTIONS 


Response measure | 4-Pattern | Data | Linear 


Total errors | 
E(T) | 7.43 7.80 7.60 
o(T) 2.53 2.52 2.27 
Trial of last error | 
E(L) | 15.47 | 12.33 | 13.53 
o(L) 4.62 | 436 | 4.78 


the four-pattern model and the linear 
model (Bush & Mosteller, 1955). 


DISCUSSION 


A two-pattern stimulus sampling 
model, which can be represented as a 
three-state, absorbing Markov chain, 
has been presented to account for 
simple conditioning. A large number 
of predictions about sequences of 
response random variables were de- 
rived in closed form and were com- 
pared to actual data obtained in an 
avoidance conditioning experiment 
with rats. The theoretical predictions 
fit the data extremely well. In par- 
ticular, the predicted Bernoulli prop- 
erties of the response sequences be- 
tween the first success and the last 
error were upheld surprisingly well by 
the data. This characteristic of the 
data is sufficient to question the 
adequacy of any linear model as a 
description of the present data. It 
should be noted, also, that no esti- 
mated parameters entered into the 
predictions about the response se- 
quences during trials between the 
first success and the last error, since 
the Bernoulli properties of the model 
are parameter-free. 

The predictions about various re- 
sponse measures over the entire 
learning sequence involved only a 
single parameter, c. Although c was 
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estimated from an arbitrarily selected 
response measure, mean total errors, 
the predictions for the numerous other 
response measures fit the data very 
well. It should be noted that entire 
distributions of response variables 
and not just mean values were fitted 
to the data. 

To demonstrate the generality of 
the present approach to conditioning, 
an N-pattern model was developed 
and applied to the data of Solomon 
and Wynne (1953) on avoidance 
conditioning of dogs. Although the 
avoidance situation for rats (uni- 
directional response) could be repre- 
sented by two stimulus patterns, it 
was found that four stimulus patterns 
were necessary to represent the avoid- 
ance situation for dogs (shuttle 
response). This result is in line with 
the expectation that the shuttle 
situation is more complex than the 
unidirectional situation. 

The two-pattern and N-pattern 
Markov models have been presented 
as applying to simple conditioning 
situations where the response prob- 
abilities increase from zero to unity. 
However, to date the models have 
been applied only to avoidance learn- 
ing data. The axioms assume that 
learning occurs only on trials on which 
an error is made, and, in fact, 
in avoidance conditioning an ex- 
perimenter-controlled reinforcement 
(shock) occurs only on these error 
trials. Therefore, it is of interest 
whether this assumption of learning 
only on error trials will be upheld by, 
say, classical conditioning or T maze 
reversal data, where an experimenter- 
controlled reinforcement occurs on 
both error and success trials. 

Another question which remains to 
be answered is why the probability 
of a correct response on trials between 
the first success and the last error is 
-5 in the data from the rats, Of course 
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this prediction follows from the two- 
pattern model, but it is difficult to 
find intuitive reasons why this should 
be the case. More reasonably, it 
might be expected that the probability 
of a correct response on these trials 
should be determined, at least partly, 
by variables such as the CS-UCS 
interval. For example, one would 
logically expect the response prob- 
ability to decrease as the CS-UCS 
interval is decreased, and vice versa. 
Working on the assumption that 
specific situational factors may affect 
various parameters in the two-pattern 
stimulus sampling model, Bower 
(1961) has developed a general two- 
pattern model, of which the model 
presented in this paper may be 
thought of as a special case. Instead 
of each stimulus pattern having an 
equal sampling probability of .5, in 
the Bower model one pattern has 
probability a of being sampled on a 
given trial, while the other pattern 
can be sampled with probability 
1—a. This model, too, predicts 
stationarity of the response prob- 
abilities on trials between the first 
success and last error. However, the 
response probability on these trials 
does not have to be exactly .5, but 
can take on any arbitrary value be- 
tween zero and unity. Thus, if 
stationarity of the response prob- 
abilities between the first success and 
the last error were found in a learning 
situation, but not at .5, some variation 
of the general Bower model would 
probably be most appropriate. 

The learning of individual paired- 
associate response shifts by humans 
has been found to satisfy the restric- 
tions of the Bower model (Theios & 
Hakes, 1962). Using color names as 
stimuli and numbers as responses, the 
correct response for a stimulus was 
shifted to one of the alternate re- 
sponses when the subject reached a 
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criterion. As predicted by the Bower 
model, the relative frequency of a 
correct response on each trial be- 
tween the first success and the last 
error was constant (about .7), and 
errors on these trials were distributed 
binomially. In general, the fit of the 
Bower model to the data is good. 

Since the model developed in this 
paper is very abstract, it might be 
interesting to assume a reductionist 
point of view and attempt to relate 
the present formulation to more tradi- 
tional interpretations of avoidance 
conditioning. The notion of avoid- 
ance conditioning as a two-stage 
process in which the subject starts in 
one state and moves through an inter- 
mediate state to a third, terminal 
state isnot new. Solomon and Wynne 
(1954) and Wynne and Solomon 
(1955) proposed an avoidance con- 
ditioning process with three states of 
conditioning. They suggest that au- 
tonomic nervous system responses are 
important in the initial state, but play 
at most a minor role in the terminal 
state. 

To speculate now about identifica- 
tion of the states in the present model, 
we may view the organism as being 
naive when the experiment starts. In 
this initial, naive state the organism 
will get shocked on every trial. The 
UCS (e.g. shock) will cause au- 
tonomic nervous system responses in 
the organism as well as skeletal 
responses. It has been customary to 
designate autonomic responses pro- 
duced by shock as “emotional” re- 
sponses. The first stage of learning 
may consist of conditioning the ‘‘emo- 
tional” responses to the CS. After 
this, the organism may be viewed as 
being in the intermediate state, where 
the presentation of the CS will cause 
the organism to become ‘‘emotionally”’ 
aroused. The autonomic or emo- 
tional arousal, which may be con- 
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sidered as an increase in drive level, 
results in an increase in the probability 
that the subject makes an avoidance 
response by chance. Because of the 
symmetry of the situation and the 
CS-UCS timing in the present experi- 
ment with rats, the probability of 
crossing before the onset of shock 
happened to be .5. The second stage 
of learning may consist of conditioning 
the instrumental, skeletal escape re- 
sponse to the CS, to the stimuli from 
the “emotional” responses, or both. 
Since the axioms of the model as well 
as the data from the rats indicate that 
conditioning of the instrumental re- 
sponse can occur only on shock trials 
and cannot occur on avoidance trials 
in the intermediate state, the rein- 
forcement for the conditioning of the 
instrumental avoidance response must 
be shock reduction and not emotion 
or fear reduction. This follows since 
avoidance responses in the inter- 
mediate state would reduce emotional 
responses, yet conditioning of the in- 
strumental response does not occur 
following these avoidance trials. 
Thus, the subject cannot avoid the 
shock until he becomes emotional in 
the situation. Following conditioning 
of the emotional responses, he has 
some fixed probability of avoiding by 
chance, until finally the instrumental 
avoidance response becomes learned. 
After this, the subject avoids the 
shock on every trial. 

This speculative interpretation 
gains some merit when latency data 
are considered. Wynne and Solomon 
(1955) have published curves of re- 
sponse latency for individual dogs 
which look like step functions with 
two jumps. The response latency 
initially is quite long, but constant, 
for the first few trials of the condi- 
tioning experiment. Then the la- 
tencies shorten to just about the 
CS-UCS interval (10 seconds) and 
remain constant there for a few trials. 


Finally, the latencies become quite 
short, about 2 seconds, and the subject 
avoids the shock on every trial. The 
first jump in the curves could be 
interpreted as the conditioning of 
emotional responses to the CS and the 
second jump interpreted as the con- 
ditioning of the instrumental response. 

However, in spite of the intuitive 
appeal of a reductionist type of 
interpretation over a purely abstract 
mathematical approach, it should be 
remembered that the real test of the 
appropriateness of a psychological 
model is how well it predicts behavior. 
The two-pattern stimulus sampling 
model predicts avoidance conditioning 
data of rats extremely well. The 
predictive ability of the model should 
be the main consideration in evaluat- 
ing the model. How well the model 
approximates intuition or traditional 
theoretical conceptualizations should 
be of lesser concern. 
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APPENDIX 


SEQUENTIAL STATISTICS AND OTHER PREDICTIONS FROM THE TWO-PATTERN MODEL 


Defining Jo as the number of errors 
before the subject's first success, the 


$c 


P(Jo = k) -1 


This probability distribution has a mean 
equal to 
EQ) =t+— [26] 
° e Lie 
A response sequence with no reversals 
can be defined as a sequence in which no 
errors occur after the first success. 
Sequences with no reversals can occur 
only if there have been at least two initial 
errors. This prediction follows from the 
model because conditioning can occur 
only after an error has been made and 
there are two stimulus patterns which 
must become conditioned before perfect 
learning is attained. The probability of 
obtaining a sequence without any re- 
versals is 
c 
P(NR) = ape [27] 
In the avoidance experiment with rats 
the obtained proportion of nonreversal 
sequences was .26, while the predicted 
value was .30. 
Letting J; represent the number of 
errors between the k** and (k + 1)" suc- 
cesses, the expectation of J; is 


EJ) -1 fork > 1. [28] 


probability that Jo takes on an arbitrary 
value, &, will be given by 


for k=1 
[25] 


c(i — o — (1 +0)(4)¥] for R22, 


The cumulative number of errors be- 
fore the k* success, F;, can be obtained 
by adding the values of J; from i = 0 up 
to i= (k — 1). The expectation of Fk 
will be 


kt 
E(F;) = 2 E(J;) 


=3-3G fall [22] 


The limit of E(Fi) as k approaches 
infinity is the expected mean total errors 
(2/c) as it should be. 

Consider the number of successes 
before a subject’s last error. If we let Z 
represent this sum, then Z can take on 
the values 0, 1, 2, 3,---. It can be shown 
that the probability distribution of Z 
will be given by 


1 k+ 
p= = (747) 
for k = 0, 1, 2,-+:. [30] 


The distribution of Z has a mean and 
variance equal to 


D Var(z) = H$. [31] 


E(2) =~ 
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TABLE 4 
SEQUENTIAL Data FOR INDIVIDUAL RATS 


Original learning Reversal learning 
Rat |1|o|ıjojıjojı| 1 |o]1]o/1 jo|1 lo} 1 
1}4)1)1)1)1 63/3 
2)4)1\3 2 

3/3) 4\1 211/1 
4|4/5)1 3/1] 1)1)1/1]1 
5|6|2)2 241)3 
6|2|1|1|1ļ|1 1j1j1 

7\4 3 

8)2)1)1 1/1)1 
9)1)1)2 4|1|2|3|1 
10) 4)2)2)1}1 3 

11 |3|1|2 1/2|1 

12 |3|1|1 1/1)1)4]1 
13|5/)1}1 4 

14 |4 3ļ|1|2 

15 |4 1/1|1|1j1j1j1 
16|3|21 2ļj1j1 

17 |2 ttia 

18 |1 1)2)1)/3]1] 4)2)1)1)1 
19|2|1|4|1ļ|1 7 

20 |5 OLTENI 
21|2|1|4|2j1 1)1)2)1)1)3)2)2)1 
22|2|5|1 2 
23|5|1|3|2|2 2)1)1)2])1 
24)2/1/1 4/3/1/1/1/1/2 
25|4|4|2 3|2|1|2|1]2|1|1|1 
26 |3 3 

27|1|1|3 213 
28|2|1|1 1/1)1)1)1 
29/3)1/1 2 

30 {5/2} 1 1/3)1)1}1 
31/5 LETENI 
32|3|1|1|1|2 11312 
33|4|1|1 3|5j1 
34/1/1/1 4 

35|3|1|1 111/3 
36|1|2|1 21/31 
37|1|1|11|3 2 

38 |1|3|2 1)2/1 

39/4 4/2)4 
40|5 10 |1|2|3|3 
41|1|21 3|2|/3|1|1|1|1|2|1 
42 |2 HATE S2 
43|1|1|3|4|1|1j1| 4 
44|3|1|2|2|1|4|/2| 1|3|1|1]1 
45|6|1|1|5]|2 3 

46 |4 2 

47 |2|3|2|5]|3 2|1|1|4]|2 
48 | 8 1)1)1 

49 |4 2)1)2)1)1 
50[3|2|1 4|2}1 


ee . 
Note.—The columns represent consecutive runs of 


failures to avoid (errors) and runs of avoidance re- 
sponses (successes). (The columns labeled 1 designate 
runs of errors, The columns labeled 0 designate runs of 
successes. The entries of the table give the length of 
the runs in number of trials. The final criterion run of 
20 successes has not been indicated, but it follows the 
last entry for each rat.) 
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Sequential Statistics 


In deriving sequential statistics it is 
useful to define a sequence of response 
random variables, X,, which take on the 
value 1 if an error occurred on Trial n or 
the value 0 if a success occurred on Trial 
n. From the axioms, the conditional 
probabilities of an error given State S; are 


P(X, = 1| Son) = 1, 


P(Xn = 1| Sin) = 4, 
PX, 1| So, n) =0. 


[32] 


Using the notation of Bush (1959), a 
j tuple of errors will be defined as u; 
where 


Ujn = Xn Xn: ete Xapi 


forj =1,2,---. [33] 
and 


Etim) =Wo.nf (1—0) 


+5 d-orcad—oy| [34] 
Hor aCO Ae). 


The expected number of j tuples of 
errors will have the solution 


E(u)=E 2 Uj,n 


= S29 1+ 1-29 (4) 


c 
forj>2. [35] 
The value of u, will be given by mean 
total errors, which has expectation 2/c. 
Bush (1959) has shown that predictions 
about runs of errors can be obtained once 
the expected u; are known. Defining 
total error runs as R, the expectation 
of Ris 
E(R) = E(u) — E(u) =14+2. [36] 
Letting 7; be the number of error runs of 
Length j, for j =1, 2, 3,---, the ex- 
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pectation of r; is 
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E(rj) = E(u;) — 2E (uj) + E(ui+2), 


łe(1 +c + 2c), 


E(r;) = 
c — of? ++ 


The number of alternations of errors and 
successes, A, over the entire learning 
sequence will be twice the number of 
error runs minus one. Thus, the ex- 
pectation of A will be 


E(A) = 2-E(R) —1=1 ++. [38] 


The obtained and predicted values for 
runs of errors are given in Table 2. 
Another useful summary of sequential 
characteristics in the response data is the 
extent to which an error on Trial n tends 
to be followed by an error & trials later, 
irrespective of the intervening responses. 
Define the autocorrelation, cx,n, as the 


a-o +o? (* - ga 


forj=1 


[37] 


2 h 
Be forj > 2. 


product Xx:Xnse, which will have the 
value 1 if errors occurred on both Trials 
n and n + k and the value 0 otherwise. 
The expectation of Cc,» is 


E(x, n) = [won (wo, Hw, e)] 
+[wint(L—4e)*4(1—o)]. C39] 

The expectation of Ck,» over all trials 

will be 

a= ED an= ÈZ a-pe, [40] 
aal 

for k = 1, 2, 3,---. The obtained and 

predicted autocorrelations of errors for 


the first few values of k are given in 
Table 2. 
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A CONTINGENT REINFORCER? 


AND JOHN J. SIMMONS, IIT 


Rutgers University 


The present report describes a series of 4 experiments undertaken to ex- 
amine an hypothesis specifying a condition under which a stimulus may 
acquire reinforcing properties. Hypothesis I: A neutral stimulus con- 
sistently presented contingent upon a response acquires reinforcing 
properties for similar responses. All of the experiments involved a 
training series wherein the stimulus was presented contingently upon a 
response and a testing series wherein the stimulus was used in a con- 
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ventional operant reinforcement situation. 


The reinforcing effect re- 


quired by Hypothesis I was demonstrable in all 4 experiments. 


The present report describes a series 
of four experiments undertaken to ex- 
amine a hypothesis specifying a con- 
dition under which a stimulus may 
acquire reinforcing properties. This 
hypothesis, which is a part of a broad 
view of motivational phenomena, ap- 
plies to positive reinforcement and is 
stated in general terms. 

Hypothesis I: A neutral stimulus 
consistently presented contingent 
upon a response acquires reinforcing 
properties for similar responses. 

In each experiment, a series of train- 
ing trials (wherein the initially neutral 
stimulus is presented contingent upon 
the response as specified by Hypothe- 
sis I) is followed by a series of testing 
trials (wherein the stimulus serves as 
the reinforcer in the familiar condi- 
tioning paradigm). 

The first two experiments are con- 
cerned primarily with showing that a 
stimulus presented in the response- 
contingent context specified by Hy- 
pothesis I has reinforcing properties. 

- The third experiment examines the 
readiness with which the reinforcing 
characteristics of the stimulus decay 
subsequent to the response-contingent 
training trials. The fourth experi- 


1 This investigation was supported by a 
grant from the Rutgers University Research 
Council. 


ment incorporates several controls 
which were suggested during the 
course of the first two experiments, ex- 
tends the validity of the hypothesis by 
using a different sensory modality and 
a different response from those in the 
first three experiments, and explores 
the possible effect of differences in the 
intensity of the response-contingent 
stimulus during the training trials. 
Each experiment was conducted by a 
different psychologist in a different 
school system (Adler, 1960; Lukacs, 
1962 ; Sharrock, 1961 ; Simmons, 1960) 
and, as a consequence, there are minor 
variations in the conditions for the 
experiments. 


APPARATUS FOR EXPERIMENTS 
I, II, ann III 


In Experiments I, II, and III pushing a 
button which closed a microswitch was the re- 
sponse under consideration, and a small green 
jeweled panel light was the response-conting- 
ent feedback stimulus. 

The apparatus was assembled on a board 3 
feet wide and 2 feet deep. The subject's side 
of the board was separated from the experi- 
menter’s side by a 2 foot vertical panel which 
served to protect the subject from being dis- 
tracted by the experimenter’s operations, On 
the subject’s side of the panel there was a 
metal box, 7X8 X2 inches. On this box and 
near the subject was a large button on which 
the subject rested his hand when he was ready 
for a response. Near the large button was a, 
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small red jeweled panel light which informed 
the subject that the experimenter was pre- 
pared for the response. At the far edge of the 
box were five apertures, each with a sliding 
cover which extended through the vertical 
panel. Each cover could be readily withdrawn 
by the experimenter to expose a microswitch 
button mounted in the aperture on the sub- 
ject’s side of the panel. The subject's response 
comprised removing his hand from the large 
resting button and depressing one of the five 
microswitch buttons. Mounted on the vertical 
panel at a distance of 2 inches above each 
microswitch response button was a small 
green light. These lights were controlled by 
the experimenter so that subsequent to any 
preselected response he could provide a 
sensory feedback for the subject. 

Thus by opening and closing the covers for 
the response buttons the experimenter could 
determine the response possible during train- 
ing trials and present a choice of responses 
during testing trials. On the experimenter's 
side of the panel, signal lights revealed the 
subject's response, and switches permitted 
the experimenter to supply the feedback light 
above any one of the response buttons. 


EXPERIMENT Í 
Method 


The subjects were 128 public school children 
(60 girls and 68 boys) from the fifth and sixth 
grades. Experiment I required four different 
training groups of 32 children each, and the 
order in which the child appeared determined 
the training group to which he was assigned. 
For all groups the experimental sequence com- 
prised two contiguous portions: a block of 
40 training trials and an immediately follow- 
ing block of 24 testing trials. Although each 
of the four groups was trained in a different 
manner, they were all tested under an iden- 
tical procedure. The children were trained 
and tested individually in a room with only 
the subject and the experimenter present. 

After the child was seated in front of 
the apparatus, he was read the following 
instructions: 


This is a kind of game. Here are 4 
buttons? (pointing) each with its own cover 
(demonstrating). On your left you see a 
big button. This is the resting button. On 
your right, you see a red light. This light 
will flash when I give you thesignal tobegin. 
Now, rest the index finger of your left hand 
on the big resting button. Wait for the red 


2 Only four of the five possible buttons were 
exposed; because of mechanical difficulties, 
the button furthest to the right was not used. 
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light on your right to flash. As soon as it 
does, lift your index finger from the resting 
button and use it to press down whichever 
of these little buttons you see uncovered. 
Then move your index finger back to the 
resting button, wait for the red signal light 
to flash and repeat what you did before. 
This will be done many times. Let's try 
it just a few times for practice. 


After the block of training trials had been com- 
pleted, the children were given the following 
instructions before they began the 24 testing 
trials: 


We are going to do the same thing that 
we did before only now you will choose 
which little button you want to press each 
time the red light flashes, since they will 
all be uncovered, 


The nature of the training sequence for each 
group is described in the following paragraphs: 

Group 1 (not trained to meet the require- 
ments of Hypothesis I—a varying response 
with no systematically contingent stimulus): 
A child assigned to Group 1 found one response 
button uncovered on each of his training 
trials; the location of this button varied from 
trial to trial according to a predetermined 
counterbalanced pattern which resulted in 
each button’s being pressed an equal number 
of times. For one-half of these training 
trials no feedback light was flashed; for 
the other 20 responses a light was flashed, 
and each of the various response button- 
feedback light combinations was presented 
at least once to each subject and no 
button contiguous feedback light was pre- 
sented more than once. The light-no light 
training trials occurred in an unbiased 
sequence. 

Group 2 (trained to meet the requirements 
of Hypothesis I—a varying response with a 
systematically contingent stimulus); For each 
of the 40 training trials the button pressing 
response was followed by a flash from the 
feedback light directly above the exposed 
button. The buttons were exposed one at a 
time according to a predetermined schedule of 
trials which permitted the subject to respond 
to each button an equal number of times. 

Group 3 (not trained to meet the require- 
ments of Hypothesis I—a consistent response 
with no systematically contingent stimulus) : 
The children assigned to Group 3 pressed the 
same response buttoa for each of their train- 
ing trials. The one response button con- 
tinuously exposed for the child was determined 
by the order in which he participated in the 
study. For one-half of these training trials 
a feedback light was flashed, but each of the 
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possible feedback lights was flashed an equal 
number of times by the same predetermined 
counterbalanced schedule used in Group 1. 
Group 4 (trained to meet the requirements 
of Hypothesis I—a consistent response with 
a systematically contingent stimulus): Each 
child in Group 4 found the same response 
button exposed throughout each of his 40 
training trials. The button exposed for any 
particular child was determined by order of 
participation in the experiment. Each time 
this response button was pushed the feedback 
light immediately above it flashed. 
Regardless of the manner in which the sub- 
ject had been trained, all four of the response 
buttons were exposed during the 24 testing 
trials. The subject was free in each trial to 
push the button he desired, and for Groups 1 
and 2 the button for which he was to receive 
reinforcement had been predetermined by the 
experimenter’s schedule. Whenever the sub- 
ject depressed the particular button for which 
he was scheduled to be reinforced, the feed- 
back light directly above that button would 
flash. For the subjects in Groups 3 and 4, the 
button followed by the reinforcing feedback 
was the same button which had been con- 
sistently exposed throughout his 40 training 
trials. 


Results 


In each of the 24 testing trials (the 
reinforcing trials) the subject was con- 
fronted with four response buttons. Ac- 
cordingly, there was one chance in four 
that he would push the button which 
was to be followed by the reinforcing 
stimulus. In the series of 24 testing 
trials we would expect the button re- 
ceiving reinforcement to be depressed 
six times on the basis of chance alone. 
In Groups 1 and 3, where the training 
trials did not meet the requirements of 
Hypothesis I, only 3 and 8 subjects in 
the respective groups of 32 pressed 
the button followed by the feedback 
light (gave the reinforced response) 
more than 6 times. In Groups 2 and 
4, where the training met the require- 
ments of Hypothesis I, 14 and 13 sub- 
jects in the respective groups of 32 
gave the reinforced response more than 
6 times. 

In Table 1 we find that the fewest 
reinforced responses occurred in 
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TABLE 1 


NUMBER OF REINFORCED RESPONSES FOR 
THE VARIOUS TRAINING GROUPS: 
EXPERIMENT I 


| 
A First Second | i 
Toast | V [12 testing] 12 testing |24 teans 
1 32| 96 93 189 
2 32| 101 116 2178 
3 32 96 104 200 
4 32 98 121 219% 
Combination 
1 and 3 64| 192 197 389 
2 and 4 64| 199 237 436% 
1 and 2 64| 197 209 406 
3and 4 64| 194 225 419 


®The performance of Groups 2 and 4, receiving 
response-contingent light training, is superior at the 
5% level to the performance of Group 1 which had no 
response-contingent light training. 

b The performance of Groups 2 and 4 combined is 
superior, 5% level, to the performance of Groups 1 and 
3 combined which received no consistent light training. 


Groups 1 and 3 (189 and 200, respec- 
tively). In Groups 2 and 4, however, 
(the two groups with training trials 
meeting the requirement of Hypoth- 
esis I) the number of reinforcible re- 
sponses was 217 and 219, respectively. 
It should be noted that the number of 
correct responses during the first 12 
trials is approximately the same for all 
four training groups, but for Groups 2 
and 4 there is a substantial increment 
in the number of reinforced responses 
during the last 12 trials. This ap- 
preciable increment for Groups 2 and 
4 and the lack of an appreciable incre- 
ment for Groups 1 and 3 suggest that 
the feedback light was an effective 
reinforcer for Groups 2 and 4 and that 
for Groups 1 and 3 the feedback light 
was not an effective reinforcer. 

The major trends revealed in Table 
1 are significant. When the results of 
the investigation were placed in a 
2X2 factorial design (response-con- 
tingent light training versus variable 
response training), it was found that 
the performance of Groups 2 and 
4 (trained with response-contingent , 
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light) was reliably superior to the per- 
formance of Groups 1 and 3 (trained 
without response-contingent light). 
There was no reliable difference be- 
tween the performance of Groups 1 
and 2 (trained with a variable re- 
sponse) and the performance of 
Groups 3 and 4 (trained with a con- 
sistent response). ¢ tests revealed that 
the performances of both Group 2 and 
Group 4 were superior to the perform- 
ance of Group 1. Other differences 
between the groups were not reliable. 

The results of these analyses corre- 
spond with the requirements of Hy- 
pothesis I. A stimulus feedback which 
has regularly followed a response may 
show some of the characteristics of a 
reinforcer in the sense that such a feed- 
back can increase the frequency of 
some similar response preselected to 
precede it regularly in the familiar 
conditioning paradigm. 

The comparisons between the groups 
trained with a consistent response 
(Groups 3 and 4) and those trained 
with a variable response (Groups 1 
and 2) are of theoretical interest 
because they help identify an essen- 
tial quality of the response-contingent 
feedback stimulus. Specifically, un- 
der the assumption that those subjects 
trained with a consistent response are 
getting a consistent response-contin- 
gent proprioceptive feedback stimulus 
these comparisons indicate that the 
proprioceptive feedback from the re- 
sponding organs is not sufficient for 
Hypothesis I and that the feedback 
must be external to the responding 
organs (whether it must have an origin 
in the external environment is a topic 
for furture investigation). 


EXPERIMENT II 


The general purposes of Experiment 
II are to seek confirmation of the im- 
plication of Experiment I by using a 
different procedure and to examine 


the possibility that experience with 
the stimulus alone is sufficient to 
provide the expected reinforcing qual- 
ities. In order to do this three addi- 
tional conditions for training were re- 
quired. One of them, applied to 
Group 5, provided the response- 
contingent training trials for a stimu- 
lus as required by Hypothesis I. 
Groups 6 and 7 provided response 
training and stimulus training, re- 
spectively, but not according to the 
contingent conditions required by the 
hypothesis. 


Method 


Experiment II exployed the same apparatus 
as Experiment I, and a sample of 144 second 
grade children was assigned to the three train- 
ing groups. 

In Experiment II, as in Experiment I, it is 
necessary to distinguish between the training 
trials and the testing trials. For each of the 
three groups required (Groups 5, 6, and 7) 
there were 20 training and 30 testing trials. 

During the training trials for Group 5, only 
the center response button was available to 
the subject. In response to the light signal 
from the experimenter, the subject immedi- 
ately removed his hand from the testing 
button and depressed the center response 
button. Since the training for Group 5 in- 
volved a consistent response and a regularly 
occurring response-contingent feedback light 
stimulus, it will be recognized as similar to 
Group 4 in Experiment I. As soon as the sub- 
ject removed his hand from the center re- 
sponse button, the feedback light above this 
response button was flashed for approximately 
one second. After a pause of from 3 to 5 
seconds this sequence was repeated. 

For Group 6 the button pressing procedure 
was identical with that for Group 5, but no 
feedback light was flashed at any time. For 
Group 7 no response button was exposed. The 
training for this group comprised only obsery- 
ing the center light flash at intervals of 3 to 
5 seconds through a series of 20 flashes. 

It will be recognized that Group 5 meets 
the training requirements for Hypothesis I. 
Group 6 fails to meet the requirements for 
Hypothesis I by omitting the feedback light 
although it provides the responses, and Group 
7 fails to meet the requirements of Hypothesis 
I by omitting the responses despite the fact 
that it does provide the feedback light. 
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The testing trials were identical for all 
three groups. The center response button was 
covered, and the covers were removed from 
the response button immediately to the left 
and the response button immediately to the 
right of the center button. Thus each child 
was confronted with a choice situation, with 
one of the alternative responses to receive a 
reinforcement in the form of a flash from the 
center feedback light. Whether a child was 
to be reinforced for pressing the left button or 
the right button was predetermined on a ran- 
dom basis. 

According to the provisions of Hypothesis I, 
the following predictions were made for the 
three training groups: 

1. More than one-half of the Group 5 sub- 
jects would prefer the response button which 
led to the feedback light. 

2. More subjects from Group 5 than from 
either Group 6 or Group 7 would prefer the 
button leading to the feedback response. 

3. The average number of responses where 
the button led to the feedback light would 
be greater for the Group 5 subjects than for 
the subjects of either Group 6 or Group 7. 


Results 


On the basis of chance alone one 
would expect that half of the subjects 
in each group would prefer the re- 
sponse leading to the light and half 
would not. It was found, however, 
that in Group 5, 72% of the 48 
children preferred the response lead- 
ing to the light. For the 48 children 
in Group 6, 53% preferred the re- 
sponse leading to the light, while for 
the 48 children comprising Group 7, 
48% preferred the response leading to 
the light. The portion of the Group 
5 subjects preferring the response lead- 
ing to the light was significantly 
greater than the portion of Group 6 or 
Group 7 subjects preferring a response 
leading to the light and was signi- 
ficantly greater than that predicted on 
the basis of chance alone. Thus it is 
apparent that the requirements for 
Predictions 1 and 2 are both fulfilled. 

On the basis of chance alone it 
would be predicted that 15 of the 30 
testing trials would show a preference 


TABLE 2 


AVERAGE NUMBER OF REINFORCED RESPONSES 
FOR Groups TRAINED UNDER 
DIFFERENT CONDITIONS: 
EXPERIMENT II 


20 


Group Training trials 


5 | Response-contingent light | 48| 18.00" 
6 | No response-contingent 48| 15.96 
light 
7 | Light only 


48| 15.31 


* Significantly different from the means for the 
groups not trained with a response-contingent light 
at the 5% level, one-tailed test. 


for the response button which was 
followed by a feedback light. It was 
found in Group 5 that the average 
number of trials which led to the feed- 
back light was 18. In Group 6 the 
average was 15.96, and in Group 7 the 
average was 15.31. A one-tailed test 
of significance shows the average num- 
ber of responses leading to a light to 
be significantly greater for Group 5 
than for either Group 6 or Group 7. 
Thus the requirement for Prediction 
3 is fulfilled. 

Experiment II affirms the validity 
of Hypothesis I. In addition, with 
its use of Group 7, Experiment II in- 
dicates that it is not merely a famili- 
arity with the stimulus used in the 
feedback which confers its reinforc- 
ing potential. It appears, instead, 
that the reinforcing value of a stimu- 
lus is acquired when the stimulus regu- 
larly follows a response. 


EXPERIMENT III 


Experiment III explores the stability 
of the reinforcing properties acquired 
under the response-contingent condi- 
tions required by Hypothesis I. The 
procedure was intended to reveal 
whether the acquired reinforcing char- 
acteristics of a previously neutral 
stimulus would be retained during the 
course of a 24-hour period. 


=. 
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Method 


One hundred and eight third-grade children 
were assigned at random to five groups. They 
participated in a series of 30 training trials 
followed by a series of 60 testing trials wherein 
two alternative response buttons were avail- 
able but only one was to be reinforced (see 
Experiment II). The trials for each child 
were conducted individually. Groups 8, 9, 
10, and 11 were trained in an identical manner 
to establish a comparable reinforcing potential 
for a previously neutral light stimulus. For 
Group 12 the training trials omitted the feed- 
back light; comparisons with this group pro- 
vide the necessary control for the data from 
Groups 8, 9, 10, and 11. 

The reinforcing efficacy of the light stimu- 
ulus was tested for all groups. Group 8 was 
tested immediately after training. For Group 
9 the testing trials were run 1 hour after train- 
ing. For Group 10 testing trials were run 
after an intervening period of 3 hours, and 
for Group 11 the testing trials were run ap- 
proximately 24 hours after the completion of 
the training. Group 12, the control group, 
was tested immediately (as was Group 8). 


Results 


The results of Experiment III are 
summarized in Table 3 which gives 
the mean number of successful, i.e., 
reinforced, trials for each of the five 
groups. It is apparent that each of 
the four experimental groups which 
received the green light feedback in 
their training trials provides more re- 
inforcible responses in the testing situ- 


TABLE 3 


AVERAGE NUMBER OF REINFORCED RESPONSES 
FOR GROUPS TESTED AT VARIOUS 
INTERVALS AFTER TRAINING: 
EXPERIMENT IIT 


raat ute gelm 60 
Group dalave N testing 
y testing | testing 

trials | trials | ‘tials 
8 None 20 | 16.608 | 17.10 | 33.70% 
9 1 hour | 21 | 17.199 | 18.338 | 35.52" 

10 3 hours | 25 | 15.08 16,608 | 31.68 
iL 24 hours | 21 | 16.10 | 16.43 | 32.538 

12 one 20 | 13.45 13.40 | 26,85 

(Control)» 


a Significantly different from the mean for the control 
group (12) at the 5% level, one-tailed test, is 
b Control group had no response-contingent light 
. during training. 
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ation than Group 12 which did not 
experience the feedback light during 
the training. Since the direction of 
this diference was anticipated by 
Hypothesis I and had been established 
in two earlier investigations, the signi- 
ficance of these differences for the 
total series of 60 trials is examined by 
a one-tailed application of the ¢ test. 
The differences between the mean for 
control Group 12 and training Groups 
8, 9, and 11 are all significant at the 
5% level. The difference between the 
control Group 12 and training Group 
10 does not generate a ¢ which is 
significant at the 5% level, but the 
statistic (¢=1.64) is significant at the 
10% level under the one-tailed test. 
Thus it is apparent that even after an 
interval of 24 hours the initially neu- 
tral light stimulus retains a reinforcing 
quality which is stronger than that 
found for the group with no response- 
contingent light training. Although 
the means for Groups 10 and 11 which 
were submitted to a 3 hour and a 24 
hour interval, respectively, between 
training and testing appear to be 
smaller than the means for Groups 8 
and 9 which experienced respectively 
no interval and 1 hour interval 
between training and testing, there 
are no significant differences between 
any pair of these means. The greatest 
difference was between Groups 9 and 
10 and yielded a ¢ of 1.34 which does 
not approach any conventional criter- 
ion for significance. 

It was feared that the reinforcing 
effect after a delay would be slight 
and a lengthy series of testing trials 
would be necessary to reveal it. Ac- 
cordingly, a series of 60 testing trials 
was provided for Experiment III. 
Nevertheless, it is possible that during 
the course of numerous reinforcement 
trials a limited series of reinforcible 
responses may by chance occur with 
sufficient frequency to generate ap- 
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preciable reinforcing qualities accord- 
ing to the principle of response- 
contingent feedback stimulus provided 
by Hypothesis I. Because of this 
possibility the data for the first 30 
trials were analyzed separately. It 
was found that Groups 8, 9, and 11 
gave significantly more reinforcible re- 
sponses than Group 12, while Group 
10 (the 3 hour delay group) was not 
significantly different from Group 12 
(the control group). 

Thus it appears that the reinforcing 
effect is maintained for a 24 hour 
period, but the particularly weak 
effect after 3 hours is unexplained. 
Since all but the Group 10 children 
were tested in the morning, perhaps it 
represents some adverse effect or some 
peculiarity of the afternoon schedule; 
e.g., the approach of time to go home. 

It should be noted that the expected 
number of reinforced responses for the 
60 testing trials of Experiment III 
would be 30.00. From this stand- 
point the reinforcing quality mani- 
fested by training Groups 8, 9, 10, and 
11 are slight and are significant only 
when contrasted with Group 12 
(Table 3) where the reinforcing effect 
of the feedback light stimulus appears 
to be smaller (the difference is not 
significant) than that expected for a 
neutral stimulus. This apparent bias 
among those who had been trained 
without a response-contingent feed- 
back stimulus was not found in Ex- 
periment II, where only 20 training 
trials were used. Nevertheless, it is 
a potentially interesting effect and 
will be scrutinized fully in subsequent 
experiments. 


APPARATUS FOR EXPERIMENT IV 


The questions concerning the status of 
Hypothesis I, which Experiment IV was de- 
signed to illuminate, required the construction 
of a new piece of apparatus. Since on the 
basis of Experiments I, II, and III it appears 
that the reinforcing effect generated for the 
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response-contingent stimulus may be rela- 
tively small and conceivably could be due to 
unintentional errors or response biasing char- 
acteristics of the experimenter, the new ap- 
paratus was designed to provide automatic re- 
cording of the stimuli and the responses, as 
well as automatic control of the temporal 
characteristics of the stimuli. 

The new apparatus, like the original one, 
was assembled on a panel of ł inch plywood, 
approximately 24X40 inches. This horizontal 
panel was divided vertically across its length 
by a second 3 inch panel, approximately 30 
inches high. On the subject's side of this 
divider was a box 163 inches long, 123 deep 
from front to back, and 4} inches high. The 
surface of this box contained a recess which 
measured 13} inches in length, 13 inches in 
depth, and 6 inches from front to back. This 
recess contained four equally spaced manipu- 
landa which were spring-loaded levers ver- 
tically mounted in such a way that they could 
be swung from right to left by the subject, 
Each of these manipulanda had its own close- 
fitting rectangular cover which could be with- 
drawn or extended through the vertical divid- 
ing panel by the experimenter, thereby per- 
mitting him to control which manipulanda 
were exposed to the subject. Along the sur- 
face of the box, near the edge close to the sub- 
ject was a bar. This bar served the same 
general purpose as the large resting button on 
the original apparatus so that a response could 
comprise the subject’s removing his hand from 
the resting bar and swinging one of the levers 
from right to left. The surface between the 
bar and the manipulanda aperture contained 
a rowof small red signal lightswhich permitted 
the experimenter to indicate to the subject 
when he should begin his response. 

On the experimenter’s side of the panel were 
signal lights which indicated when the sub- 
ject's hand was on the resting bar and when 
a manipulandum was deflected. A micro- 
switch button permitted the experimenter to 
signal the subject when a response was 
desired. 

The apparatus employed in Experiment IV 
was different from the original apparatus in 
that the interval between the subject’s re- 
sponse and the contingent feedback stimulus 
was electronically controlled at a duration of 
one second and the duration of the feedback 
stimulus was also electronically controlled at 
$ofasecond. Anamplifieranda speaker were 
mounted on the experimenter’s side of the 
vertical panel, and an audio-oscillator set at 
500 cycles provided the source of the auditory 
response-contingent feedback stimulus. De- 
cibel settings on the oscillator, although not 


A CONTINGENT REINFORCER 


standardized for the purpose of the experi- 
ment, afforded some systematic variation in 
the intensity of the auditory stimulus. The 
apparatus was wired in such a way that a 
10-channel Esterline-Angus recorder could 
provide an automatic record of the entire 
series of signal-response-feedback stimulus 
sequences. 


EXPERIMENT IV 


In evaluating the results of the 
first three experiments, it could be 
argued that the appearance of the 
light following the pressing of the 
button is so familiar and so practical 
in everyday life that it has acquired 
the properties of a secondary rein- 
forcement. On this basis the small re- 
inforcing effect observed for the feed- 
back light would not require Hypothe- 
sis I for an explanation, and our inter- 
est should shift to the question of why 
the reinforcing effect was not observed 
in the control groups trained without 
a response-contingent stimulus. Per- 
haps the absence of this reinforcing 
effect among those subjects whose 
training trials did not involve a 
response-contingent light, but did in- 
volve some other experience with the 
light or the button pressing response 
could be explained in terms of some 
temporary suppression or extinction of 
the secondary reinforcing properties 
of the lights. The comparisons re- 
quired by Experiment IV involve a 
relatively unfamiliar response (swing- 
ing a lever transversely from right to 
left) and a relatively unfamiliar stimu- 
lus (a pure tone). The explanation 
that the effect in question may be due 
to the possibility that the response- 
contingent stimulus had secondary re- 
inforcing characteristics before the 
training trials began may be less 
plausible for Experiment 1V than it 
was for the case of a light stimulus 
in Experiments I, II, and III. 

As a safeguard, however, Experi- 
ment IV was preceded by a pre- 
_ liminary exploration where a 500- 
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cycle tone was used as a reinforcing 
stimulus in a direct series of testing 
trials (not preceded by training trials) 
for a group of 12 eighth grade students 
drawn from the population which was 
the source for the Experiment IV 
sample. In this series all four manipu- 
landa were exposed to the subject, and 
he was instructed to respond to the 
signal light by removing his hand from 
the resting bar, swinging the manipu- 
landum of his choice, returning his 
hand to the resting bar, and upon the 
light signal from the experimenter 
again remove his hand from the rest- 
ing bar and swing the manipulandum 
of his choice, etc. The series com- 
prised 60 responses for each subject. 
Since the subject had no training 
trials, it is argued that if the sound 
were initially an effective reinforcer it 
could be revealed in this sequence of 
60 testing trials; certainly it could 
not be claimed that prior training 
trials which did not follow a response- 
contingent stimulus pattern had inter- 
ferred with a pre-established second- 
ary reinforcing value for the tone. 
The particular manipulandum which 
was to be reinforced for a given sub- 
ject was determined by a prearranged 
schedule. The results for the first 
30 trials and for the second 30 trials 
were examined separately. For the 
first 30 trials, the average number of 
reinforcible responses was 7.5. For 
the second group of 30 trials, the aver- 
age was 8.25. Since the subjects had 
four manipulanda available to them, a 
perfectly unbiased selection would 
have resulted in 7.5 reinforcible re- 
sponses in 30 trials. It is apparent, 
therefore, that neither the first half 
nor the last half of this series of 60 
trials provides any support for an 
interpretation that the tone employed, 
500 cycles at a 20 decibel setting,’ had 
è The actual intensity of the sound was not 


determined and it may be best described as a 
soft, but clearly audible tone. 


426 WITTENBORN, Apter, LUKACS, SHARROCK, AND SIMMONS 


appreciable value as a secondary re- 
inforcer for the class of response under 
consideration. 
Accordingly, Experiment IV was 
undertaken on the assumption that 
the tone did not have a pre-established 
value as a reinforcer, at least for the 
present responses. Whether the small 
insignificant increase from 7.5 in the 
first 30 trials to 8.25 in the last 30 
trials reflects the accrual of some rein- 
forcing properties to the tone during 
the course of testing is an interesting 
topic for speculation ; certainly no sub- 
ject got “insight” and developed a 
lengthy series of “reinforcible” re- 
sponses. The nearest thing to this 
was the performance of one child 
whose last eight response choices were 
reinforced. 
Experiment IV examines the fol- 

lowing questions: 


1. Is the response-contingent feed- 
back stimulus principle as defined by 
Hypothesis I applicable to auditory 
stimuli as well as to visual stimuli? 

2. If Hypothesis I is applicable to 
auditory stimuli, does the intensity of 
the feedback stimulus during training 
trials affect the subsequent efficacy of 
the stimulus as a reinforcer? 

3. Is there some important inter- 
action between the intensity of the 
stimulus during the training trials and 
the intensity of the stimulus during 
the testing (reinforcing) trials which 
would require that the stimulus in- 
tensity used in reinforcement must be 
comparable with the stimulus inten- 
sity used during training? 

4. Is the reinforcing quality which 
is conferred to a stimulus under the 

conditions of Hypothesis I apparent 
only among subjects who readily in- 
dicate an awareness that their re- 
sponse selection during the rein- 
forcing trials was guided by the rein- 
forcing stimulus? 


Method 


Since Experiment IV involved the use of a 
spring-loaded lever which was difficult for 
small children to manipulate and since a 
posttesting inquiry concerning subject's choice 
of response was anticipated, children from the 
eighth grade were used instead of younger 
children. The total sample comprised 144. 
The design of the experiment called for four 
different training conditions: 36 were trained 
without a feedback stimulus; 36 were trained 
with a feedback stimulus of low intensity (a 
decibel setting of 20); 36 were trained with a 
stimulus of moderate intensity (a decibel 
setting of 35); and a final group of 36 was 
trained with a feedback tone which could be 
described as loud but which was not described 
as painful (a decibel setting of 50). The 
children trained under each condition were 
assigned at random to be tested with a feed- 
back stimulus of one of the three levels of in- 
tensity. Thus the general design may be seen 
as a factorial arrangement requiring four dif- 
ferent training conditions and three different 
testing conditions (12 different experimental 
groups). There were 12 children assigned to 
each of these 12 groups, and each child re- 
ceived a series of 30 training trials followed 
immediately by 30 testing trials. 

In Experiment IV, as in the other experi- 
ments, the subjects were seated throughout 
the trials. The children were asked to use 
their dominant hand and to place the other 
hand in the lap. During training the second 
lever from the left was exposed, while the 
other three were closed. During testing the 
first and third levers from the left were ex- 
posed, while the remaining two remained 
covered. Thus the testing trials comprised a 
simple discrimination learning situation cor- 
responding with the testing procedure that 
was used in Experiments II and III. After 
the testing sequence was completed, each 
child was asked why he had selected the lever 
he preferred. No pointed inquiry was made 
for fear that a specific interest on the part of 
the investigator would be transmitted to the 
remaining subjects in a way which could bias 
either their performance or their response to 
the inquiry. 


Results 


Table 4 gives the average number of 
reinforcible responses made by each 
of the 12 subgroups. The difference 
between Training Conditions 1 and 2, 
1 and 3, and 1 and 4 are all significant 
at the 5% level or better when exa- 
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TABLE 4 


AVERAGE NUMBER OF REINFORCED RESPONSES 
FOR GROUPS TRAINED AND TESTED 
WITH VARIOUS INTENSITY 
or FEEDBACK SOUND: 
Experiment IV 


Sound intensity during 
f 


Sound testing 
intensity Total 
during training | 
None 13.79 
Low 18.56* 
Medium 20.77" 
High 19.475 
Total | 18.15 
* Average combined number of reinforced trials is 


higher at the 5% level than the combined average for 
the group without a response-contingent feedback. 


mined by the ¢ test under the two- 
tailed convention. The direction of 
all of these differences corresponds 
with the requirements of Hypothesis I 
and indicates that this hypothesis is 
applicable to a contingent sound stim- 
ulus of 500 cycles at any one of three 
levels of intensity. The interaction 
between stimulus intensity during 
training and stimulus intensity during 
testing is not significant, and there is 
no suggestion that within the ranges 
provided by the present experiment a 
congruence in intensity level between 
training and testing produces more 
efficacious reinforcement than lack of 
congruence. It should be noted that 
among the 9 groups trained and tested 
according to the requirements of 
Hypothesis I the lowest score was 
yielded by that particular subgroup 
which was exposed to the high inten- 
sity in both the training and testing 
trials. The contrast between this 
group and the other eight groups 
trained according to Hypothesis I was 
not significant, but it may indicate 
some aversive reaction to loud sound. 

When asked why they chose their 
response, there were 76 who referred 
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to the reinforcing sound in some man- 
ner or another. There were 68, how- 
ever, who stated they preferred the 
location of the chosen manipulandum 
or offered some other remark unre- 
lated to the feedback stimulus, in- 
cluding the opinion that they had no 
preference or had no idea why they 
might have chosen one manipulandum 
in preference to another. Of the 68 
subjects whose comments did not in- 
volve the feedback stimulus, 24 had 
been trained without a feedback stim- 
ulus, and of these only 5 preferred the 
reinforced response in more than one 
half of the 30 testing trials. The re- 
maining 44 children who did not asso- 
ciate the reinforcing feedback stimulus 
with their preference during the test- 
ing trials had been trained with the re- 
sponse-contingent feedback stimulus. 
Of these 44, 27 gave the reinforcible 
response in more than half of their 30 
testing trials. (A fourfold test of in- 
dependence yields a chi square which 
is significant at the 1% level.) It is 
apparent, therefore, that the reinfore- 
ing effect of the feedback stimulus for 
children trained under the response- 
contingent conditions of Hypothesis 
I is not limited to those particular in- 
dividuals who associate the reinfore- 
ing feedback stimulus with their choice 
of manipulandum during the testing 
trials. Because of the casual nature of 
the inquiry, the possibility that some 
such association exists in the aware- 
ness of the children who provided 
more than 50% reinforcible responses 
cannot be excluded, but evidence of 
such an association certainly did not 
characterize the verbal responses of all 
the children for whom the feedback 
stimulus showed some reinforcing 
effect. 
DISCUSSION 


All of the foregoing experimental in- 
terpretations of Hypothesis I are con- 
firmatory, and none provides indica- 
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tion of an important limitation. It is 
now apparent that the principle de- 
scribed in Hypothesis I is applicable 
within at least two sensory modalities 
(vision and audition) and to two kinds 
of manual responses (pressing a but- 
ton and moving a lever). Neverthe- 
less, further experiments are being con- 
ducted to explore the manner in which 
the applicability of Hypothesis I is 
qualified. 

It is of interest to review several of 
the observations which were made in 
the course of the four experiments. 
The selection of the number of train- 
ing trials and the number of testing 
trials for Experiment I was an arbi- 
trary decision, but the analysis of the 
data indicated that the reinforcing 
efficacy of the response-contingent 
stimulus was manifested gradually in 
a manner characteristic of learning 
curves and that 12 trials were not 
sufficient to show this effect, while 24 
were. As a consequence of this ob- 
servation, the subsequent experiments 
were planned with a minimum of 30 
testing trials. 

Experiment II indicated that the 
gross effect of Hypothesis I could be 
produced by as few as 20 training 
trials, but the indications of the pos- 
sible reinforcing value of “no feedback 
light” for subjects who had been 
trained without a feedback light was 
not apparent in Experiment II with 
only 20 training trials. The possi- 
bility of “no feedback stimulus” being 
a reinforcer was suggested by Experi- 
ment III where there were 30 training 
trials, and this possibility was con- 
sistent also with the data from Experi- 
ment IV which provided 30 training 
trials. Specifically, for Group 12 of 
Experiment III there was no response- 
contingent stimulus in the training 
trials, and for this group fewer (the 
average was 26.85) than the expected 
number (an average of 30 would be ex- 


pected for 60 trials) of reinforced re- 
sponses were recorded for the testing 
trials. This could suggest either that 
for this group the reinforcing feedback 
stimulus had acquired aversive quali- 
ties or that, since the response- 
contingent training situation was not 
characterized by the feedback light, 
the no feedback light was reinforcing 
in the testing situation. This possi- 
bility is reintroduced by the data of 
Experiment IV where for Training 
Condition 1 (no response-contingent 
feedback stimulus) the number of re- 
inforced responses for testing condi- 
tions involving a reinforcing stimulus 
of low intensity, moderate intensity, 
and high intensity yielded reinforcible 
responses having the respective aver- 
ages of 12.83, 13.91, and 11.75, all 
lower than the average of 15.00 which 
would be the expected average for a 
series of 30 testing trials where the re- 
inforcing stimulus was unbiased by 
prior training effects. Perhaps the 
testing (reinforcing) situation which 
did not contain a feedback sound was 
somewhat reinforcing for subjects 
who had been trained without a re- 
sponse-contingent feedback sound. 

Experiment III was concerned with 
the enduring qualities of contingent 
reinforcers, and for this reason the 
testing trials were extended through a 
series of 60 responses. Independent 
analyses of the second 30 of these test- 
ing trials revealed that the level of per- 
formance for the second 30 trials was 
not significantly superior to that for 
the first 30. On this basis and on the 
consideration that reinforcing quali- 
ties might possibly accrue from an ex- 
tended testing series it was inferred 
that 30 testing trials may be an opti- 
mal number for such experiments. 
Accordingly, 30 testing trials were 
planned for Experiment IV. 

Since cognitive explanations for the 


present phenomenon could only beg | 
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the behavioral question posed by Hy- 
pothesis I, the investigators are most 
interested in the conditions under 
which a presumably neutral stimulus 
may acquire reinforcing characteris- 
tics, and not what the subject thinks 
about such stimuli. Nevertheless, 
there was some interest in the extent 
of the subject's concern with an in- 
sightful awareness. Accordingly, at 
the conclusion of the Experiment IV 
trials, the subjects were given an op- 
portunity to comment upon the basis 
of their choice of response. The re- 
sults of the analysis of the data for 
those 68 subjects whose comments 
omitted any reference to the feed- 
back stimulus sustain the writers’ 
opinion that cognitive consideration 
need not be emphasized at this point. 
It may be noted, however, that in the 
training trials the subjects are ap- 
parently not learning an S-R but may 
be learning an “expectation” or a 
“hypothesis,” possibly similar to some 
of the conceptualizations of Tolman 
(1932). Although it is not apparent to 
the writer that Tolman’s conceptualiz- 
tions provide a basis for anticipating 
Hypothesis I, it is possible that some 
such principle as Hypothesis I might 
be useful in describing the conditions 
under which goals, expectations, and 
predispositions may be acquired. 

On the contemporary stage, how- 
ever, the phenomena described by 
Hypothesis I are reminiscent of the 
studies of sensory reinforcement. Par- 
ticularly, they rest upon indications 
that stimuli which have no apparent 
value for reducing primary drives and 
which may be claimed as secondary 
reinforcers on an ex post facto basis 
only may, nevertheless, be shown to 
possess reinforcing characteristics when 
used in an operant conditioning situa- 
tion. To claim that such stimuli are 
merely secondary reinforcers is not 
_particularly helpful. From certain 
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standpoints, at least, the nature of 
secondary reinforcement itself is far 
from clear (Myers, 1958). Certainly 
explicit demonstrations wherein sec- 
ondary reinforcers are established and 
successfully applied are disappoint- 
ingly meager (Schoenfeld, Antonitis, 
& Bersh, 1950) when gauged from the 
standpoint of the extent to which this 
concept may be relied upon to account 
for reinforcing effects. 

Nevertheless, the fact seems well 
established that stimulation via vari- 
ous sensory modalities, whether simple 
(Roberts, Marx, & Collier, 1958) or 
complex (Harlow, 1950), novel 
(Barnes & Kish, 1961) or familiar 
(Butler, 1957), can exert a reinforcing 
effect for both animal (Kish, 1955) 
and human subjects (Frey, 1960). 
The conditions under which the 
stimuli are effective reinforcers (Pre- 
mack, 1959) and the circumstances 
through which they become effective 
reinforcers remain a matter of con- 
siderable uncertainty. 

The theorizing applied to the newer 
reinforcement phenomena tends to be 
reductionistic in nature with some 
writers, e.g., Harlow (1950), believing 
that the postulated predispositions 
accrue diffusely from the structure of 
the organism, while others, e.g., Hebb 
(1949), refer to some specifiable region 
of the central nervous system. As 
Hunt and Quay (1961) have noted, 
however, a formulation of reinforce- 
ment based on Hebb’s notion that re- 
moval of accustomed stimulation may 
be disturbing to an animal and that 
restoration may be reinforcing is not 
always readily verifiable. Neverthe- 
less, there is some similarity between 
Hebb’s formulation and the present 
Hypothesis I in the particular sense 
that Hypothesis I provides for a situa- 
tion in which an organism may be 
accustomed to a stimulus and then 
posits that in a testing situation where 
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the availability of the accustomed 
stimulus is contingent upon the per- 
formance of a preselected act there is 
an appreciable tendency for the pre- 
selected act to occur more frequently 
than its alternative. It should be 
noted, however, that Hypothesis I is 
not a reductionistic hypothesis. It is 
a behavioristic hypothesis which pro- 
vides for a relationship between prior 
and consequent behavior events with- 
out invoking nonbehavioral character- 
istics of the organism. 
It is interesting to examine certain 
familiar behavioral phenomena from 
the standpoint of the present Hy- 
pothesis 1I. For example, the effects 
of intermittent reinforcement as de- 
scribed by Humphreys (1939), trans- 
lated into Hullian concepts (Hull, 
1943) by Virginia Sheffield (1949), 
and explored by Grant, Hake, and 
Hornseth (1951) may describe a situa- 
tion where primary reinforcement dur- 
ing the training trials is sufficient to 
account for a relatively consistent re- 
sponse from trial to trial. On those 
trials where no primary reinforce- 
ment occurs, however, there may be 
response-contingent feedback stimuli 
which, if consistent in nature from 
nonreinforced response to nonrein- 
forced response, would under Hy- 
pothesis I acquire reinforcing char- 
acteristics. Presumably such acquired 
reinforcing characteristics of stimuli, 
which had occurred subsequent to the 
nonreinforced responses, would pro- 
vide reinforcement during extinction 
trials and in this way account for the 
fact that those subjects in Grant’s ex- 
periment who had the fewest rein- 
forced training trials were the slowest 
to extinguish. This interpretation is 
favored by Zimmerman’s (1957) dis- 
cussion wherein he emphasizes that 
the intermittent reinforcement sched- 
ule which is most favorable from the 
standpoint of resistance to extinction 
should be one where the transition 
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from 100% primary reinforcement to 
something less than this should be 
gradual. On this basis, the 100% re- 
inforcement schedule favors the oc- 
currence of a consistent response which 
can persist from trial to trial so that 
a feedback experience can be contin- 
gent to a response in later unreinforced 
trials. A similar interpretation of Hy- 
pothesis I might be applicable to 
Robinson's (1961) report of persis- 
tence of responses in the absence of 
primary reinforcement. In his ex- 
periment animals learned to escape 
shock by running out of a compart- 
ment and subsequently learned to 
avoid shock by running out of the 
compartment in response to a warning 
buzzer which was terminated by their 
escape. In later trials, the animals 
learned to press the bar which ter- 
minated the buzzer despite the fact 
that the original motivating shock was 
never delivered. The persistence of 
the bar pressing response which led to 
the termination of the buzzer in the 
later trials, where the escape response 
to the buzzer had been extinguished, is 
a matter of conjecture on the part of 
Robinson. It is possible, however, 
that the termination of the buzzer sub- 
sequent to the bar press may be re- 
garded asa response-contingent stimu- 
lus which was regularly experienced 
and under Hypothesis I would have 
acquired a reinforcing effect which 
could persist and thereby main- 
tain the bar pressing response in- 
dependently of the original escape 
reinforcement. 

It should be noted that Brogden 
(1962) has reported a study wherein a 
tone was delivered to cats in a re- 
sponse-contingent feedback situation. 
In consequence of this training, the 
tone acquired a capacity to elicit the 
response, Although Brogden’s study 
does not examine the possible rein- 
forcing qualities of the stimulus an 
does not, therefore, relate to the con- 
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ditions under which a stimulus ac- 
quires reinforcing properties, it does 
correspond with a phenomenon which 
the writer and his associates have re- 
cently observed in human subjects. 
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THE RELATION BETWEEN TEST 
AND PERSON FACTORS 
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There has been considerable disputefabout whether R (test) and Q 
(person) factoring of the same data produces the same or different 
factors. This paper attacks the problem by considering the differences 
produced in factor configurations by operations on the data preliminary 
to or inherent in factoring one way or the other. It is concluded that 
although the nature of the factors does not change in any qualitative 
way, configurations may be altered sufficiently to make it appear that 
they have, and that it is therefore important to consider questions of 
sampling and measurement in evaluating the results of any factor 


analysis. 


The dispute about Test (R) and 
person (Q) factor methods has been 
sporadic and not very acrimonious. 
Furthermore, some of the central 
disputants have given ground on 
occasion, but, despite this, there has 
been produced a sharp cleavage of 
opinion about R and Q. In a word, 
opinion splits on whether R and Q 
produce substantially the same fac- 
tors, except for a first general factor. 
The two main disputants have been 
Burt and Stephenson. The situation 
has more recently become more com- 
plicated as a result of Broverman’s 
(1961) argument that the differences 
between R and Q (which he considers 
substantial) are a function, not of the 
direction of correlating, but of the 
direction in which the matrix to be 
correlated is centered. 


Outline of the Issues 


The two most elaborately developed 
points of view on R and Q are those of 
Burt (1937, 1940) and Stephenson 
(1952). Fortunately it is easy to 
bring out the issues between them 
because of a joint publication (Burt & 
Stephenson, 1939) in which the au- 
thors agreed as to most of the points 
on which they differed. Burt’s start- 


ing point (Burt, 1937) is that if data 
standardized is doubly centered (i.e., 
row and column means and variances 
are 0 and 1) factoring by persons gives 
the configuration of person vectors in 
exactly the same factor space as fac- 
toring by tests. As Burt (1940) later 
puts it for the doubly centered and 
standardized case “‘the factor loadings 
for persons obtained by correlating 
persons are identical with the factor 
measurements for persons obtained by 
covariating tests [p. 290].” From his 
initial starting point, Burt (1940) 
develops the position that factoring 
by tests retains a general test factor 
while losing a general person factor, 
and that factoring by persons does 
just the opposite, but that both 
methods agree in the secondary factors 
they obtain. Burt suggests that if our 
interest is in the secondary factors we 
decide between the R and Q options 
on the basis of economy. Stephenson 
(1952; Burt & Stephenson, 1939) dis- 
agrees with Burt not so much on a 
point of mathematics as on a point of 
inferential logic which he takes to be 
prior and to invalidate Burt’s whole 
mathematical argument. Stephen- 
son’s (Burt & Stephenson, 1939, 
p. 275) point hinges upon what he 
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considers a difference in the aims of 
R and Q and upon a distinction be- 
tween populations and variables. The 
difference Stephenson sees in aims is 
that, whereas R technique seeks to 
discover the fundamental components 
of a variate—to break up its variance 
into communal parts—Q technique 
seeks to verify hypotheses about the 
dependence of one variable and an- 
other. Stephenson, using terms which 
have caused much confusion, dis- 
tinguishes R as a form of interdepend- 
ency analysis, and Q a form of de- 
pendency analysis. R uses testable 
attributes as variates and individuals 
as populations whereby to observe the 
covariation of the attributes; Q 
samples, on the other hand, are com- 
posed “to entail the interdependencies 
of the theory at issue [Stephenson, 
1952, p. 486]” and those variates are 
used which test the dependencies. 
Stephenson has made much of the 
distinction between interdependency 
and dependency analyses as a funda- 
mental division of methodologies. It 
was, however, first proposed by Ken- 
dall (Kendall & Babbington Smith, 
1950) as only a “useful distinction [p. 
60].” If we forget the terminology, 
and look at what Stephenson says, he 
seems to mean that in R analysis one 
is interested in studying intrinsic, and 
therefore universal, connections be- 
tween variables—i.e., interdepend- 
encies—and that one needs to use 
large and unbiased samples of in- 
dividuals in order to do so. In Q 
analysis, on the other hand, one is 
interested in effects which occur as a 
result of manipulations by the theorist 
of what he considers to be independent 
variables. Thus extraversion (E) and 
neuroticism (N) may have no intrinsic 
connection and thus be uncorrelated 
in the population at large, or orthog- 
onal in an R analysis. If we select 
samples of hysterics and normals, 
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however, we shall find (assuming 
Eysenck to be correct) a correlation 
between E and N, since the hysteric 
group will be both extraverted and 
neurotic. The independent variables 
here would be clinical entities, the 
experimenter’s manipulation his selec- 
tion of subjects. Alternatively, the 
experimenter may compose structured 
Q samples of statements—say state- 
ments directly indicating hysteria, 
introverted normality or extraverted 
normality, and statements indicating 
neuroticism only, or introversion only, 
or extraversion only—and, using a 
blindly selected sample, or even a 
single individual (the landlady, for 
example) predict linkages between 
statements from the theory. The 
point of confusion in Stephenson’s 
presentation seems to be that the 
dependent effects he talks about are 
themselves usually dependencies, i.e. 
correlations or associations between 
variables, rather than variations in 
magnitude or frequency. More com- 
monly, in experimental work, the 
effects would be effects of magnitude, 
not association. 

Cattell (1952) and Eysenck (1953) 
have followed a line quite similar to 
Burt's, developing and expounding it 
in a less mathematical and therefore 
probably more influential fashion. 

Burt, Cattell, and Eysenck say the 
factors after the general factor are the 
same, Stephenson that their position 
is a misunderstanding of the whole 
issue, that the “same or different” 
question is one which cannot be 
asked at that level. The most recent 
entrant to the dispute is Broverman 
(1961) who takes the Burt-Cattell- 
Eysenck question to be sensible, but 
gives a different answer. He argues 
that R and Q give different factors, 
and that we recognize two different 
classes of factor accordingly, but he 
also argues that the difference is due 
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to the direction in which the data 
matrix is centered either before or in 
the process of factoring rather than 
to the direction along which the 
factoring itself is conducted. 

The present paper dodges Stephen- 
son’s question as to aim and the 
variate-population distinction and 
concerns itself with the problem of the 
correct answer to the question ‘“‘same 
or different?” 


LINE or ATTACK 


One way to attack the problem is 
through examples. One may take 
some data, apply R analysis, then Q, 
center the means, and start again, 
standardize the variances as well as 
the means and start- yet again. By 
comparing results we can determine 
whether the methods give the same or 
different results. The problem is the 
choice of examples, since, with high 
probability, general features will be 
confounded with peculiarities of the 
examples chosen. And if the out- 
comes may be similar or different 
depending upon certain critical data 
features—and we shall show that this 
is so—compiling examples is likely 
to compound confusion. 

The attack we shall use will be to 
state the factor model in broad terms 
and then study the variation produced 
in the factor solution by operations on 
the means and variances of data 
arrays. When this is done, examples 
will be given to show how the same 
operations on different data may 
produce factor pictures which change 
very little or factor pictures which 
appear to change a great deal. 


Factor MODEL AND ITS 
VICISSITUDES 


Stripped to the bare, formal bones 
the factor model relates a matrix of 
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data scores D, of N rows by n columns 
(N subjects by n tests), to a factor 
framework by the equation: 


D = Fy F'n [1] 


where Fy is a matrix of subject pro- 
jections on factor axes, and Faa 
matrix of tést projections on the same 
axes (cf. Anderson & Rubin, 1956). 
On psychological grounds it may be 
insisted that Fy or F, have a particular 
form, that certain dimensions be dis- 
regarded as unique or as error, and so 
on. But this need not concern us at 
all. 

The standard mathematical de- 
composition of the matrix D is 


D= UB W [2] 


where U and W are orthogonal mat- 
rices, (i.e, U U’ = WW = I) and 
B is a diagonal matrix with as many 
nonzero entries as the rank of D. 
Mathematicians call U and W’ re- 
spectively the left- and right-hand 
eigenvectors of D, and the diagonal 
values of B the eigenvalues or latent 
roots of D. 

The equation above may be re- 
written 


D = UB. BW [3] 
It follows that 


Fy = U Bir, [4] 


ll 


and 
F, = W Bir [5] 


where T is an orthogonal transforma- 
tion matrix, one which geometrically 
rotates the axes orthogonally. Fy and 
F, are the factor matrices familiar to 
psychologists. The T matrix repre- 
sents freedom to rotate. Equations 4 
and 5 do no more than bring out the 
relationship between the factor mat- 
tices of psychology and the more 
standard expressions of matrix algebra. 
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As the mathematics tells us, post- 
multiplying D by its transpose re- 
moves W from the picture, since 

DD' = UB U' [6] 
and premultiplying by the transpose 
removes U, since 

D'D=WBW'. (7) 
Bearing in mind the relationships 
stated in Equations 4 and 5, we can 
see that the first product, DD’ can 
thus be used to solve for Fy, the 
second, D'D, for Fa since, from 
Equations 4, 5, 6, and 7 


D D’ = FyB Fy, 


[8] 
[9] 


The matrix of correlations between 
people is related very closely to the 
row (person) cross-product matrix 
D D’, that between tests to the column 
(test) cross-product matrix D’ D. In- 
deed if the scores have been standard- 
ized to zero mean and unit variance, 
the correlation matrices are identical 
with the cross-product matrices except 
for a scalar multiplier. What is done 
by standard R factor analytic tech- 
niques which work with correlations, 
is to take test scores as standardized. 
The factor loadings from the correla- 
tion matrix are simple linear functions 
of the factors which yield D*, where 
D* is standardized by rows. (The 
“*” above D is used to mean that it is 
the old D but with its rows stand- 
ardized.) 


D! D! = FAB F'n, 


Setting Means to Zero 


Let us assume the means of the 
rows of D are zero, and their variances 
are unity. Assume we have carried 
out a Q analysis by correlating sub- 
jects and have found an Fy such that 
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Equation 1: 
D= Fy F’ n. 


Assume we have also found F’». Any 
entry dj; in D, the ith subject's score 
on the jth test, will be given by 


r 
dy = È fufu [10] 
g 

where fj, and fj, are entries from Fy 
and F,, respectively, and r the rank of 
the system, or the number of factors. 
The mean of any column j in D will 
therefore be given by 


ma GE Ive Se MM 


If we set all the column means (all the 
m;'s) to zero, we can do so only by 


N 
making each © f; zero. That is to 


say, the effect of setting the test 
means to zero in the data matrix is to 
shift the centroid of the subject con- 
figuration to the origin of the factor 
space so that the mean of each factor 
column in the subject factor matrix 
Fy is zero (Tucker, 1956). As Tucker 
points out, one immediate effect this 
may have is to confuse unique factors 
with common factors. 

Conversely, setting subject means 
to zero before doing R analysis places 
the centroid of the test configuration 
at the origin. 

The effect can be seen quite clearly 
in an example used by Broverman 
(1961) to illustrate what he supposes 
to be a change in factor structure 
produced by double centering. Figure 
1 shows the plot of Boverman’s factors 
without first setting subject means to 
zero. The origin used for the plot is 
Oi, and, it is apparent that there are 
two factors and a simple structure. 
The second origin, Oz, is set at the 
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Fic 1. Data from Broverman showing the 
effect of double centering on the origin of the 
factor space. 


centroid. From this origin it appears 
that there is only one factor, de- 
partures from the linear plot looking 
like nothing more than error. But the 
configuration of the tests has not 
changed. All that has happened is 
that the origin has shifted. Table 1 
shows the factor loadings against O,, 
which Broverman obtained, and the 
loadings he obtained by centering, re- 
computing correlations, and factoring. 
It also shows readings from Figure 1 
upon the axis through Os, these values 
being corrected for magnitude. It is 
clear that the difference between first 
and second factor outcomes can be 
accounted for by the shift in origin. 
As we shall show in a later example, 
the shift in the centroid with respect 
to the origin, which is only a trivial 
change in a strict mathematical sense 
(since all it means is a change in 
reference point, not in the configura- 
tion itself) is far from trivial in 
practical terms, since it may produce 
what appear to be strikingly different 
factor patterns. It can certainly 
cause a difference in interpretation 
because the shift is to the centroid of 


the test configuration and thus neces- 
sarily produces bipolar factors. The 
bipolar factors may not be simply 
defined, as the one in Broverman's 
example is, but since tests will be 
located in at least three of four 
quadrants of the factor space (in the 
2-factor case), at least one bipolar 
factor will be needed to describe the 
space. 

To summarize so far: (a) setting 
subject means to zero then factoring 
tests (R) shifts the origin to the 
centroid of the tests; (b) setting test 
means to zero then factoring subjects 
(Q) shifts the origin to the centroid 
of the subjects. Since Q is almost 
always carried out with double center- 
ing—i.e., with test means set to zero 
first Q factors will almost always be 
bipolar. Since R is almost always 
carried out without double centering, 
R factors will usually not be bipolar 
although, of course, they may be. 


Changing the Variances of Arrays 


To adjust the variances of rows, 
once means have been set to zero, 


TABLE 1 


BRovVERMAN’s Two SETS or FACTOR LOADINGS 
AND LOADINGS FOR THE SINGLE 
CENTERED CASE DERIVED 
GRAPHICALLY* FROM Ox 


IN FIGURE 1 
Factor loadings 
' | Broverman's 
eravennay 5 | computed |Graphically 
Test single centered double astisen 
loadings centered loadings 
loadings 
hi he hi hi 
Height 953 | —085 —945 = 
Armlength| 960 | —005 —832 —85 
Foot size 939 139 —664 —68 
Neck size 021 910 584 68 
Waist —123 | 931 865 80 
Weight —267 | 974 930 97 


a The graphically derived loadings were scaled up so 
that the sum of the absolute values of factor cross- 
products and correlations were the same. 

b The average of the absolute residual values were 
077 for Broverman’s computed loadings and 065 for the 
graphically derived loadings. 
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is to premultiply D by a diagonal 
matrix, which depends on the vari- 
ances, say Vy. If the raw data 
matrix with means taken out is D, 
the matrix standardized by rows, 
which will produce scalar multiples of 
subject intercorrelations on postmulti- 
plication, is Vy D, where Vy contains 
elements o;7'. Since in Equation 8: 


DD" = FyB F'y, 
it follows that 


(VwD)( VyD)' = VyFyB F'yV'y [12] 
Thus premultiplication of the data 
matrix premultiplies the factor matrix 
for row elements, which means that 
rows from the factor matrix are each 
scaled up by a different multiplying 
value. Vectors from covariance mat- 
rices thus are identical with vectors 
from correlation matrices except for 
differential “stretching,” the amount 
of which depends on the variance of 
the data array involved. 

Similarly, to adjust column vari- 
ances by postmultiplication, then to 
find the column factors, is to pre- 
multiply the column factor matrix by 
the data postmultiplier. 

The effect of standardizing will 
therefore be to adjust the lengths of 
vectors in the factor space, and this, 
coupled with origin shifts due to mean 
adjustments, can further deceive the 
investigator into thinking that data 
manipulations have produced quite 
novel factor analytic outcomes. 

Let us now examine standard- 
ization in both directions. Assume 
we have standardized our row (per- 
son) arrays to mean zero and variance 
one, in preparation for correlation by 
rows and factor analysis (Q). Assume 
column (test) means are zero but the 
variances have been untouched and 
are unequal. Our analysis gives as a 
factor matrix for rows (persons), Fyı. 
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Now we go back to our raw, un- 
standardized data, standardize column 
variances, then standardize the rows 
and analyze again. The second time 
we find Fy:. What is the relationship 
between Fy; and Fy? 

By standardizing columns we have 
postmultiplied D by a diagonal matrix 
Va. This will have upset row vari- 
ances, so we will have to premultiply 
by Vy to restore them to unity. The 
means, too, will generally have been 
upset, but we will assume that, in this 
case, they have not. Our doubly 
standardized data matrix is thus 

D, = VND Vn 


[13] 


The correlation matrix between rows 
is thus 
[14] 


(Note that since Vy and V, are diag- 
onal, Vy = V’y and Va = V'n). Re- 
call that in Equation 2: D = UBW’, 
and that in Equation 6: DD’ = U BU’, 
the term W'W disappearing because 
w'w = I. The effect of postmulti- 
plying D by Va is to give D, the form 


[15] 


and the right hand term W’V, no 
longer has the property of ortho- 
gonality. This means that Fe, the 
row factor matrix derived from cor- 
relating the rows on D, will not be an 
orthogonal transformation of Fwi. 
All we can say is that Fy; and Fyz have 
the same number of factors. Their 
relationship otherwise will depend 
on the structure of W and the in- 
equality of the column variances. If 
the column variances are roughly 
equal, Fyz will probably differ little 
from Fy: But if they are not, it is 
very difficult, if not impossible, to 


Ry = VND V?,D'V' 


D, = VyUBW'V,, 


` generalize about the relation between 


the two sets of factors. 
The argument here is transposable. 
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Adjusting row (person) variances 
before correlating by columns causes 
a parallel upset in the relation between 
the two sets of column (test) factors 
Fry and Fre. 


IMPLICATIONS 


In order to make manifest what 
happens to factor patterns as a result 
of operations upon the data we have 
stated the factor model in a way 
which makes correlations incidental 
rather than primary in the factoring 
process. Taking correlations as pri- 
mary data has been a major source of 
confusion in the R, Q controversy. 

We have kept track of what hap- 
pens to factor patterns as we perform 
various operations upon our data 
matrix of scores. The major con- 
clusions are as follows: 


1. Factoring by rows where each 
row contains all the scores for a given 
subject gives the location of people in 
the factor framework, 

2. Factoring by columns gives the 
location of the tests in the same 
framework. 

3. Setting column (test) means to 
zero before factoring by rows sets the 
origin of the factor space at the 
centroid of the row (person) con- 
figuration. 

4. Setting row (person) means to 
zero before factoring by columns sets 
the origin of the factor space at the 
centroid of the column (test) con- 
figuration. 

5. Altering the variances of rows 
(columns) before factoring by rows 
(columns)—say by factoring covari- 
ances instead of correlations—alters 
the lengths of the vectors correspond- 
ing to people (tests) but otherwise 
does not change the configuration 
within the factor space. 

6. Differentially altering the column 
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(row) variances before factoring by 
rows (columns) can upset the whole 
factor pattern, depending on the 
extent of the alteration and the 
structure of the right-hand (left-hand) 
vectors of the data matrix. 


The extent to which any of these 
operations on the data leads to 
differences in factor interpretation 
depends upon the factor configuration 
that underlies the data matrix to 
begin with. Let us now consider an 
actual example, to illustrate the 
various effects listed, and, another 
question we have left in the back- 
ground so far, the relationship be- 
tween the test configuration and the 
person configuration. 


Two Simple Numerical Examples 


It was noted earlier that the factor 
model, in its simplest form relates a 
score matrix (D) to factor coefficients 
(Fn) and people’s scores (Fy) by the 
Equation 1: 


D = Fy F'n 
In the first example there are four 


tests, and three people. The coeffi- 
cient matrix for the tests is 


Factors 
Tests I II 
a 1 =1 
b 1 1 
c —1 1 
d —1 —1, 


and the matrix of people’s factor 
scores on the same two factors is 


People 
Factors 1 2 3 
I 1 1 1 
II -1 0 1 


kin aar 
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The score matrix is thus: 


People 
Tests 1 2 3 
a 2 1 0 
b 0 1 2 
c —2 -1 0 
d 0 -1 —2 


If we factor the cross-product matrix 
for people, working from the scores as 
they stand, we find the following 
factor pattern: 


People 
Factors 1 2 3 
I 2 2 2 
II —2 0 2 


This is exactly the matrix of factor 
scores we began with, multiplied by 
the scalar value 2. 

If we now take out the test means 
before developing and factoring the 
people cross product matrix we find 
only one factor. The result we have 
is: 


People 
Factors 1 2 3 
I —2 0 2 


What has happened can be seen most 
simply by looking at Figure 2. The 


a) 


Fic. 2b. Result 
after centering. 


Fic, 2a. Original 
factor plot. 


Fic. 2. The apparent loss of a factor due 
to centering before an R analysis. 
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Fic. 3a. Original Fic. 3b. Result 
factor plot. after centering. 
Fic. 3. Origin shift due to centering be- 


fore an R analysis which does not “lose a 
factor.” 


first analysis locates the three people 
vectors with respect to O;, and thus 
needs two factors to do it. The result 
of taking out the test means is to 
shift the origin to Oz, and so only one 
factor is needed to represent the 
people vectors. 

The second example demonstrates 
the same point, and is put in to show 
that taking out test means before 
factoring by people (or vice versa) 
does not necessarily reduce the num- 
ber of factors produced by the 
analysis. 

The example is presented graph- 
ically in Figure 3. Figure 3a gives the 
factor solution from the raw data, 
Figure 3b the solution obtained after 
removing test means. Clearly, re- 
moving test means from the data has 
again shifted the origin to the centroid 
of the people (Os in Figure 3a), but 
the people points are not collinear 
with Oz and so two factors are still 
needed to represent their disposition. 

It is important to note that if we 
factor in the other direction, in 
Example 1 (working through test 
cross products), we lose a factor if we 
take out test means. We end up with 
projections on the horizontal axis, and 
no information about the differences 
in location on the perpendicular axis. 
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Two Sets of Factors or One? 


The reader may harbor the uneasy 
feeling that the mathematical changes, 
shifts in origin, stretching of vectors, 
and the fact that one set of points 
rather than another is located by R 
and by Q, all add up to a lot and are 
by no means inconsistent with Brover- 
man’s conclusion that R and Q, or 
rather the centerings usually em- 
ployed with each, produce different 
sorts of factors. But it is not so. 

To review the situation and show 
what the differences amount to: The 
central fact to be held in mind is that 
to accept the factor model as appro- 
priate is, in itself, to impose a common 
set of factors upon both tests and 
people. Furthermore the mathe- 
matics of the methods of solution for 
the model—via correlation or other- 
wise—are such that if either tests and 
people fail to represent variation on 
a factor, it may be lost. Thus if tests 
sample a verbal, a number, and a 
spatial factor, but if people have 
variable values only on the verbal, the 
verbal and only the verbal will appear 
in the analysis if means are taken out 
before or during the analysis. 

Within the factor space imposed by 
the factor model, there is both a 
configuration of test points and a con- 
figuration of people points. R anal- 
ysis solves for the test configuration, 
Q for the people configuration, but 
both within the same factor dimen- 
sions. In theory, then, both R and Q 
should identify exactly the same 
factors, not different ones. The 
practical catch is that the configura- 
tion itself is used to identify the 
factors, and that test and people 
configurations, which occupy and use 
up the same factor space may be very 
different, and lead to quite different 
guesses at (i.e., interpretations of) the 
factors of the space in which they lie. 
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In the first of our examples the people 
points fell along a straight line, the 
test points in a rectangle. In Brover- 
man’s example the set of (test) points 
looked two dimensional from one 
origin, one dimensional when the 
origin was shifted by centering the 
data matrix. Clearly, a factor analyst 
confronted with real data with prop- 
erties like these, might well end up 
with quite different factor inter- 
pretations depending upon the direc- 
tion in which he chose to factor and 
upon the preliminary operations he 
carried out on the data, and the effect 
of those inherent in his technique of 
factoring. 


Do We Lose Just a General Factor? 


Burt’s position (1940) is that the 
centering operation costs us only a 
general factor, and that factors before 
and after centering, or R and Q 
factors derived through correlations, 
will substantially correspond except 
for a different general factor in each 
case, or the presence of a general 
factor in the one case and its absence 
in the other. Putting to one side 
complications due to variance—the 
differential stretchings due to stand- 
ardizing along the direction of factor- 
ing, and the nonorthogonal trans- 
formations due to standardizing in the 
opposite direction—Burt's position is 
correct only in the case where the 
general factor corresponds exactly to 
the first centroid and where variation 
in factor loadings about the mean 
factor loading is sufficiently small to 
ensure that the general factor in a 
noncentered analysis is only a very 
small one after centering. As we have 
shown, the operation of centering 
effects a shift in origin from one 
determined by mean values to the 
centroid of the configuration being 
factorially located. The factors or- 
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thogonal to the first centroid in the 
noncentered case are identical (up to 
an orthogonal transformation) to the 
factors in the centered case. If, how- 
ever, our general factor is not the 
centroid in the centered case, the 
other factors orthogonal to it will 
pass through a different origin from 
that involved in the centered fac- 
torization and the two sets of factors 
will not be orthogonal transforms of 
one another. That is to say, it will be 
impossible to rotate one solution into 
the other. 


Effects on Simple Structure 


In order to gain a clearer picture of 
the practical magnitude of all the 
various effects and to clarify some 
unclear points, (e.g. the protective 
value of simple structure) a set of 
artificial data was constructed which 


Fic. 4a. 
from covariances, 
column means untouched. 


Row factors 
with 
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more nearly approximated in size and 
messiness the data of real life than the 
two earlier sets. Ten tests were 
placed along two oblique axes in a 
two-space, Tests 1, 2, 3, 4, 5, and 6 on 
one axis, tests 7, 8, 9, and 10 on the 
other. Ten points to represent people 
were scattered arbitrarily, but with 
their major scatter along the axis of 
Tests 1-6. Figure 4a gives the results 
of the analysis of the test covariance 
matrix, and, as can be seen, there are 
two clear-cut oblique factors, one for 
Tests 1-6, one for Tests 7-10. Note, 
however, that Tests 1 and 2 form a 
separate cluster lower down the 
factor than Tests 3 to 6. Figure 4b 
gives the results of the analysis of the 
test correlation matrix after removing 
person means from the data matrix. 
Tests 3-6 and 7-10 now define op- 
posite ends of a bipolar factor, and 


10: 8 I 


Fic. 4b. Row factors from correlations, with column 
means in data matrix first set to zero. 


Fic. 4. Origin shift and “stretching” due to centering before R analysis and to use 
of correlations instead of cross products or covariances. 
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Tests 1 and 2 a factor almost orthog- 
onal toit. This dramatic realignment 
comes about because of the shift of 
the origin to the centroid of the tests, 
and the stretching of each vector to 
unit length. 

One might imagine that although 
the tests realign themselves with 
respect to factors, the three clusters 
at least preserve their separate in- 
tegrity. But this would not be true 
if we correlated tests without first 
removing person means. In this case, 
the 1 and 2 cluster would be pushed 
into the 3-6 cluster, since each would 
be the same distance from the origin 
in Figure 4a, not that in Figure 4b. 

The example makes it clear that 
simple structure, by itself, would not 
be sufficient to ensure an interpreta- 
tion that was safe against the vicissi- 
tudes of the analytic machinery. At 
the same time it suggests that by a 
variety of analyses of the same data— 
deleting some tests or subjects on 
occasion, shifting means and vari- 
ances, adding new tests or subjects— 
and by a careful psychometric analysis 
of the tests to determine their meas- 
urement properties, it may be possible 
to define and to find the invariances 
of the factor configurations. How- 
ever, it is outside the competence of 
this paper to try to follow up this 
suggestion. 


CONCLUSION 


Perhaps the best way to keep one’s 
bearings in the whole situation is to 
be clear, firstly on the relation be- 
tween the factor model and the 
empirical data, secondly on the way in 
which operations to normalize and 
ipsatize the data affect the outcome 
of the analysis, and thirdly on the two 
distinct questions that can be asked 
having performed the analysis. The 
two questions are: (a) How many 
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factors does the use of the factor 
model impose, and what are they? 
and (b) What is the configuration of 
the tests, or the subjects taking the 
tests with respect to the factors that 
are found? 

If we do this we shall not fall into 
the confusion of supposing that opera- 
tions on the data change the nature 
of the factors that are found, we shall 
not be surprised if we find, as we 
necessarily must, bipolar factors with 
an R analysis of ipsative scores, and 
we shall be aware of the necessity for 
proper sampling, emphasized by Ste- 
phenson (1952), if we wish to make a 
statement about a population of in- 
dividuals or tests on the basis of a 
sample. Knowing the dependence of 
the factor solution on the treatment 
of the data we shall, or we should, not 
treat it cavalierly and should ask such 
questions as whether there is informa- 
tion in the means, or the variances, 
before throwing them away. Most 
importantly, perhaps, is that com- 
prehending both the initial coordina- 
tion of the factor model to the 
empirical data, and understanding its 
vicissitudes under manipulations of 
the data, we shall not make mistakes 
in interpreting and locating the factors 
themselves, 
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SOME TESTS OF A THEORY OF INTRACRANIAL 
SELF-STIMULATION! 


J. A. DEUTSCH anp C. I. HOWARTH 
Stanford University 


Deutsch’s behavior theory is applied to intracranial self-stimulation. 
This assumes that both motivational and reinforcement pathways 


are electrically stimulated. 


Satiation does not occur because each 


stimulus excited the motivational pathways anew, and “extinction” is 
fast because in the absence of further stimulation the motivational 
effect decays rapidly. 3 groups of experiments supporting the theory 
are reported: (a) the tendency to perform such habits is a simple func- 
tion of time since the last brain stimulus, (b) habits learned for intra- 
cranial stimulation can be evoked by normal motivation, (c) the 


motivational and reinforcement eff 
excitable and so due to different ph 


The essential feature of Deutsch’s 
(1953, 1956, 1960) theory of behavior 
is that it seeks to give a circuit dia- 
gram of a system which behaves in 
the same way as the rat does. The 
units of the theory are specified pre- 
cisely, using concepts such as con- 
nections of variables resistance, 
switches and excitation, and the inter- 
actions of the units are clearly de- 
scribed. This theory constitutes a 
system whose behavior is largely de- 
termined by its structural intercon- 
nections, and is relatively insensitive 
to the exact quantitative relations 
between the units. The same is true 
of many electronic circuits, where for 
example a multivibrator or amplifier 
circuit can be recognized even when 
the exact values of the tubes, capaci- 
tors and resistors is not known, Simi- 
larly, a multivibrator circuit remains 
a multivibrator circuit even if the 
values of all the components are 
changed (within limits), but if a single 
connection is broken a new circuit is 
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Hamburg for hig ' -'» and encouragement. 
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formed with completely new proper- 
ties. 

In this way, the theory combines a 
high order of logical precision with a 
low order of quantification, but is 
nevertheless able to make rigorous 
and testable predictions about be- 
havior. At the present time the 
theory has been supported by a large 
number of critical tests and has not 
yet met any major piece of contra- 
dictory evidence. It has been applied 
to a very wide range of rat behavior, 
for instance, drinking, drive discrimi- 
nation, latent learning, and reasoning 
behavior. (See summary by Deutsch, 
1960.) 

In the present paper we are con- 
cerned with the application of the 
theory to the phenomenon of electrical 
self-stimulation in the rat. The 
theory is peculiarly suitable for ap- 
plication to neurobehavioral data 
since one of the chief difficulties in ex- 
plaining behavior in physiological 
terms is that although we know a 
great deal about the basic units of the 
nervous system, the neurons, we are 
still very ignorant concerning the 
functional interconnections of these 
units. Structural theories such as the 
one discussed here can be considered 
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blueprints for the kinds of functional 
interconnections which must exist in 
the nervous system. And the be- 
havior of rats which learn for electrical 
stimulation of parts of the brain can 
be simply explained in terms of the 
artificial stimulation of certain of these 
connections. To make this applica- 
tion of the theory comprehensible, we 
must first make a brief restatement of 
the basic features of the system. 


ELEMENTS OF THE SYSTEM 


The main problem the theory tries 
to solve is how sensory stimuli are re- 
corded in the nervous system in the 
order in which they occur, so that if 
a sequence of stimuli has in the past 
led to a goal such as food or water, 
then that sequence of stimuli will be 
sought whenever a need for food or 
water develops. 

The basic unit of the theory is 
shown in Figure 1. The analyzer is a 
perceptual element which is sensitive 
to one particular environmental cue. 
The whole unit constitutes a feedback 
system which under appropriate cir- 
cumstances will cause the animal to 
approach the environmental cue to 
which the analyzer is sensitive. The 
link is an element which becomes con- 
nected to other links by pathways 
which can carry excitation generated 
by a few “primary” links. Each 
primary link is sensitive to a particu- 
lar physiological factor such as testo- 


Environment 


Fic. 1. 


Basic unit of Deutsch’s system. 
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sterone or extracellular osmotic pres- 
sure. For convenience, we shall call 
this type of excitation “motivational 
excitation” and the connections which 
carry it “motivational pathways.” 
The analyzers of the primary links are 
sensitive to the stimuli met in eating, 
drinking, or sexual activity. 

Initially, all links possess nonfunc- 
tional connections to many, possibly 
all other links. If the analyzers at- 
tached to two links are stimulated in 
sequence, then the connection between 
them becomes functional so that moti- 
vational excitation can pass along it. 
Each time the analyzers are stimu- 
lated in sequence the connection is 
strengthened, but stimulation of the 
first analyzer not followed by stimula- 
tion of the second, will lead to a 
weakening of the connection. In this 
way, the succession of stimuli en- 
countered on a well-trodden path will 
be represented as a sequence of links 
connected to each other in the order 
in which the stimuli occurred. 

To mediate this strengthening and 
weakening of the motivational path- 
ways, the links must be informed 
about the stimulation of the analyzers 
attached to other links. This infor- 
mation is carried in a second pathway 
between the links which for con- 
venience we shall call the ‘‘reinforce- 
ment pathway.” 

The chains of links when formed 
control behavior according to the fol- 
lowing rules: 

1. When the analyzer attached to 
a link is not stimulated, any excitation 
reaching it along a motivational path- 
way simply passes on to other links 
in the chain. 

2. When the analyzer attached to a 
link is stimulated by its environmental 
cue, and the link is receiving excita- 
tion along a motivational pathway, 
the excitation is diverted into the 
motor system so as to cause the animal 
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to approach the part of the environ- 
ment to which the analyzer is sensi- 
tive. The motivational excitation is 
in that case not passed on to other 
links. 

When an animal is, for example, 
thirsty, the excitation generated in a 
primary link by extracellular osmotic 
pressure will flow through chains of 
links which have in the past become 
connected to this primary link. It 
will pass down these chains until it 
reaches a link whose analyzer is being 
stimulated. The animal will then 
approach the part of the environment 
which is stimulating that analyzer, 
In doing so the animal probably will 
perceive the stimulus for the next 
analyzer in the chain. The link at- 
tached to this analyzer will now divert 
the excitation from the first link and 
into its own motor system. This will 
make the animal approach the second 
stimulus. The process will continue 
with the animal following a sequence 
of stimuli which have in the past led 
to water until it reaches the water, 
For full details of the operation of the 
system, see Deutsch (1960). 


APPLICATION OF THE THEORY TO 
ELECTRICAL SELF-STIMULATION 


Rats will learn to perform habits 
for electrical stimulation of certain 
parts of the brain (see, for example, 
Olds, 1958a). These habits differ 
from more normally learned habits in 
two significant and paradoxical ways: 
(a) They are very resistant to satia- 
tion. (b) In contrast, when the elec- 
trical stimulus is switched off, extinc- 
tion is extremely rapid, taking only a 
few seconds. 

The theory can be extended to ac- 
count for this behavior in a very 
simple way (Deutsch, 1960). Figure 2 
shows the postulated relationship 
between the units of the theory and 
the electrode in the brain. 
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Fic. 2. Effective site of the electrode to 
explain learning for electrical stimulation of 
the brain, according to Deutsch’s theory. 


If an electrode in the brain were to 
stimulate a functional motivational 
pathway to the units responsible for 
lever pressing behavior, the rat should 
immediately press the lever as a result 
of the stimulation. This can in fact 
occur. Wyrwicka, Dobrzecka, and 
Tarnecki (1959) showed that an ani- 
mal which had been taught while 
hungry to perform a habit for food, 
would subsequently begin to perform 
it when not hungry, if it were stimu- 
lated in a region of the lateral hypo- 
thalamus. In naive animals the elec- 
trical stimulus does not at first pro- 
duce lever pressing. To explain how 
it learns to do so, one must assume 
that the electrode is stimulating both 
the motivational and reinforcement 
pathways to the units responsible for 
lever pressing (Figure 2). The mo- 
tivational pathway will initially be 
nonfunctional so that the stimulus will 
at first have no effect on the lever 
pressing. But if stimulation of the 
analyzers concerned with lever press- 
ing is followed by stimulation of the 
reinforcement pathway, the system 
will behave as if the next analyzer in 
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the chain had been stimulated. Asa 
result the motivational pathway will 
become functional and will be 
strengthened each time the lever 
pressing stimuli are followed by stimu- 
lation of the reinforcement pathway. 
When the motivational pathway is 
functional, the electrical stimulus will 
lead to lever pressing and the animal 
will have learned to press the lever. 

Each time the animal presses the 
lever additional motivational excita- 
tion is produced so that the habit does 
not satiate., But when the electrical 
stimulus is switched off, no further 
motivational excitation is injected and 
the animal stops pressing the lever, 
showing what appears to be very fast 
extinction of the habit. In this way, 
the theory explains the most im- 
portant peculiarities of self-stimula- 
tion behavior. 

One difficulty for our conception of 
separate motivational and reinforce- 
ment pathways is the fact that for 
learning to occur, the right pair of 
pathways must be stimulated. There 
is no reason to expect their anatomical 
arrangement to be such as to make 
the appropriate dual stimulation 
probable. In fact, the stimulus 
probably acts on a large number of 
motivational and reinforcement path- 
ways simultaneously. The stimula- 
tion of irrelevant pathways does not 
matter, provided both the motiva- 
tional and the reinforcement path- 
ways leading to links connected to the 
bar pressing analyzers are stimulated. 
The probability of this occurring will 
be greater the larger proportion of the 
total pathways which is stimulated. 
It is easier to obtain electrical self- 
stimulation in small animals and with 
large electrodes. With smaller elec- 
trodes and larger animals, the thresh- 
old for electrical self-stimulation in- 
creases, and the probability of 
obtaining it decreases. This is 
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exactly what one would expect if self- 
stimulation depended on the simul- 
taneous stimulation of two pathways 
not very close together. 

We have tested additional predic- 
tions from the theory concerned with: 

1. The idea that “extinction” is due 
to the decay of motivational excita- 
tion and is not analogous to extinction 
of more normal habits. 

2. The idea that normal motiva- 
tional pathways are involved. 

3. The idea that two separate path- 
ways are involved, one concerned with 
motivational excitation, the other 
with reinforcement. 


FIRST SERIES OF EXPERIMENTS 


The simplest way to test the hy- 
pothesis that extinction of these habits 
is due to the decay of motivational 
excitation lay in its prediction that 
extinction should be a function of time 
since the last electrical stimulus to the 
brain, and independent of the number 
of unreinforced lever presses. In 
contrast, normal extinction is almost 
entirely a function of the number of 
unreinforced trials. It is also almost 
independent of the time since the last 
reinforcement (Skinner, 1950), and is 
faster with massed rather than spaced 
trials (Rohrer, 1947). 

During preliminary observation, we 
simply removed self-stimulating rats 
from the lever by hand. They 
struggled for a few seconds and then 
lost interest. If released while strug- 
ling, they returned to the lever; if 
released when they had ceased to 
struggle, they did not return to the 
lever. This was extinction with no 
unreinforced trials at all. 

Animals which were released while 
still struggling returned to the lever 
but made a much smaller number of 
lever presses to extinction than ani- 
mals which were not removed from 
the lever. This suggested that we 
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could follow the decay of the moti- 
vational process by measuring the 
number of lever presses remaining to 
extinction as function of the time since 
the last stimulus to the brain. If 
unreinforced lever presses had no 
effect on the extinction, then the 
number of lever presses occurring 
after, say 7 seconds of normal ex- 
tinction, should be the same as the 
number of lever presses to extinction 
when the animal had been removed 
from the lever for 7 seconds, then re- 
leased. In this way, the distribution 
of lever presses during ordinary ex- 
tinction should enable us to predict 
the effect of removing the animal from 
the lever for a time before starting the 
extinction. 


Experiment Ia 


In our first experiment (Howarth & 
Deutsch, 1962), we tested this pre- 
diction on six male Sprague-Dawley 
rats with various bipolar electrode 
placements. We compared their ex- 
tinction under two conditions, one of 
normal extinction and one in which 
the lever was removed from the cage 
for 7 seconds and then returned. 
From the records of normal extinction, 
we predicted that the average number 


of lever presses after 7 seconds of lever 
withdrawal should be 1.92. The 
average observed number was 1.87. 
The average time to the last lever 
press was 11.4 seconds for normal ex- 
tinction and 10.1 seconds for extinc- 
tion including seven seconds of lever 
withdrawal. The full data are shown 
in Table 1. The hypothesis that un- 
reinforced lever pressing is of negli- 
gible importance in extinction of these 
habits is strikingly confirmed. 


Experiment Ib 


Secondly, (Howarth & Deutsch, 
1962), we used the time course of 
normal extinction to predict the effect 
of removing the lever from the cage 
for 2.5, 5.0, 7.5, and 10.0 seconds, 
before starting extinction. We used 
five naive male albino rats of the 
Sprague-Dawley strain, two of which 
had monopolar electrodes in the 
lateral hypothalamus and three mono- 
polar electrodes in the basal tegmen- 
tum. The different times of lever 
withdrawal were given in a balanced 
order with intervening trials of normal 
extinction. Figure 3 shows the result 
of the experiment. The normal ex- 
tinction data were used to predict the 
number of lever presses to extinction 


TABLE 1 
EFFECT OF LEVER WITHDRAWAL OR PRESSES TO EXTINCTION 
Number of lever presses to extinction Time last lever press, seconds 
Animal | Electrode site 
Total normal | After 7 seconds | After 7 seconds Normal After 7 seconds 
extinction | normal extinction] lever withdrawal| extinction _| lever withdrawal 
20 hypothal 9.0 9.4 10.8 
21 | hypothal 9.5 i 14.7 19.4 
26 septal 3.4 3.4 
27 tegmental oe, 1.6 
15 tegmental 10.0 9.7 11.1 
15 tegmental 32 3.9 
23 | hypothal 7.5 0.25 0.0 7.0 <7.0 
21 | hypothal 11.0 1.3 1.7 9.5 8.8 
ROEE =A 
M 1.87 1.92 10.1 11.4 
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following the various periods of lever 
withdrawal on the assumption that 
extinction is a simple function of time 
since the last brain stimulus. The 
agreement between prediction and 
observation is very close. 


Experiment Ic 


Since the data shown in Figure 3 
can be considered to reflect the decay 
of motivational excitation it should 
be possible to measure the growth of 
motivational excitation in a similar 
way. This was done in an extended 
experiment on a single animal. In- 
stead of removing the lever from the 
cage for varying intervals before ex- 
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Fic. 3. Effect of seconds of lever with- 
drawal on lever presses to extinction. (The 
circles show the average effect of withdrawing 
the lever from the box for a given time before 
replacing it, on the number of lever presses to 
extinction which immediately follows the re- 
placement of the lever. The dashed line 
shows the expected result if extinction were 
a function of the number of unreinforced 
lever presses. The solid line shows the expec- 
tation from the hypothesis that extinction in 
this case is a simple function of time since the 
last brain stimulus. The way this curve is 
derived is shown on the top line in the figure 
which represents a typical extinction record 
obtained on an event recorder. Ten presses 
altogether are made. Four of these occur 
before 2.5 seconds so that one would predict 
that six lever presses should be made if the 
lever were removed for 2.5 seconds before 

« starting the extinction trials.) 
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Fic. 4. Number of lever presses to ex- 


tinction as a function of duration of intensity. 
(The solid circles show the dependence of the 
number of lever presses to extinction on the 
duration of an increment in stimulus intensity. 
The crosses represent the time course of the 
subsequent extinction in the same way as the 
solid line in Fig. 3 For clarity, this is shown 
only for extinction following 0 to 10 seconds 
of increased intensity. Curves for the inter- 
mediate intervals follow straight parallel 
courses showing that the lever pressing rate 
is relatively constant until it ceases altogether. 
The exponential form of Figure 3 is almost 
entirely an artifact of the averaging of data.) 


tinction, we increased the intensity of 
the stimulus by 50% for 2.5, 5.0, 7.5, 
and 10.0 seconds before switching it 
off and recording the normal extinc- 
tion curve. The result of this experi- 
ment (Howarth & Deutsch, 1962), is 
shown in Figure 4. The number of 
lever presses to extinction increases as 
a function of the duration of the 
higher intensity in a way which re- 
sembles its decrease as a result of re- 
moving the lever from the cage. The 
symmetry of Figure 4 probably does 
not mean very much. It depends on 
a lucky choice of the increment in in- 
tensity, and the curve does not really 
reach an asymptote after 10 seconds. 
After 30 seconds at the increased in- 
tensity, the number of lever presses 
to extinction had increased to 45. To 
compare the time constants of growth 
and decay of the motivational excita- 
tion would be possible only if the rate 
of lever pressing were proportional to 
the intensity of excitation. This is 
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certainly not the case under the 
present conditions where the rate 
remains almost constant during ex- 
tinction until it stops altogether. 


Experiment Id 


Experiment Ib showed a very 
striking agreement in the rates of 
extinction with and without lever 
pressing. But it is just possible that 
the agreement is fortuitous, the acci- 
dental result of two different processes 
showing the same time course. To 
make the identity more likely, one 
should show that they are affected in 
the same way by another variable. 
For this purpose we chose intensity of 
brain stimulus as the independent 
variable. The theory predicts that 
time to extinction should increase 
regularly with the intensity of the 
brain stimulus and that the effect 
should be the same for extinction with 
and without lever pressing. 
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Experience in the last two experi- 
ments indicated that a simpler cri- 
terion of extinction could be applied 
to extinction with and without lever 
pressing. We counted extinction com- 
plete when the animal turned away 
from the site of the lever, making at 
least a 90-degree turn and not re- 
turning to it. To validate this cri- 
terion we also counted the number of 
lever presses to extinction in normal 
extinction. Four animals were used 
and the results are shown in Figure 5. 
For comparison we also measured the 
rates of lever pressing at all intensi- 
ties. Two things emerge from the 
data: first, that extinction is a simple 
function of time since the last brain 
stimulus for all intensities of stimula- 
tion used and increases regularly with 
the intensity of the brain stimulus, 
thus confirming both our predictions; 
second, that resistance to extinction 
bears a simpler relationship to in- 
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Fic. 5. Intensity, its effect on rate of responding and time and presses to extinction. (Effect 
of intensity of brain stimulation on lever pressing rates—solid line, number of presses to ex- 
tinction—circles, time to normal extinction—crosses, and time to extinction without level 


pressing—dots.) 


. 


Some TESTS OF A THEORY OF INTRACRANIAL SELF-STIMULATION 


tensity of stimulation than does lever 
pressing rate. 

The latter observation seems to 
support the idea that the irregular re- 
lationship of lever pressing rate to 
intensity of brain stimulus is due to 
interfering motor effects (Hodos & 
Valenstein, 1962), and not to the 
stimulation of aversion centers by the 
higher intensities (Olds, 1958b). 

Taken together, the first series of 
experiments provide strong evidence 
that extinction of  self-stimulation 
habits is a result of the decay of moti- 
vational excitation. A similar rapid 
extinction of a habit maintained by 
electrical stimulation of the lateral 
hypothalamus has been found by 
Wyrwicka et al. (1959). In this ex- 
periment the habit was learned for 
food when the animal was hungry and 
was subsequently evoked in the 
satiated animal by stimulation of the 
lateral hypothalamus. In this case, 
the food reward was still available so 
that the swift cessation of responding 
could only be due to the reduction of 
the hunger drive when the stimulation 
of the lateral hypothalamus stopped. 


Experiment Ie 


In the experiment carried out in the 
Skinner box, we had to interpolate 
nonreinforced trials in order to meas- 
ure the decay of the tendency to 
respond. A different experiment was 
designed (Deutsch and Rimm?) to see 
whether this affected our findings 
seriously and also to test the gener- 
ality of these findings. Rats with 
chronic implants in the posterior 
hypothalmic region were taught to 
traverse an electrified grid for an 
intracranial stimulus. Another group 
with similar implants was taught to 


2Unpublished manuscript entitled “Run- 
ning Speed as a Function of Delay following 
Intracranial Stimulation” by J. A. Deutsch 

. and D. Rimm. 
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cross the grid for water when thirsty. 
After training, both groups were run 
under two counterbalanced condi- 
tions, one with trials immediately 
succeeding each other and the other 
with trials 30-seconds apart. We 
found that for the animals running 
for intracranial stimulation the mean 
of the reciprocals of running time in 
seconds (based on a total of 90 trials 
in each condition) was 0.6 for the 
no delay condition and 0.33 for the 
30-second delay condition. For the 
group running for water the mean of 
the reciprocals of running time in 
seconds was 0.2 for the no delay con- 
dition and 0.62 for the 30-second delay 
condition. Thus animals running for 
water speed up after a delay; those 
running for an intracranial stimulus 
slow down. 


Experiment If 


The objections may be made to all 
the experiments so far that they 
demonstrate no more than the sub- 
sidence of some general arousal or 
excitement after intracranial stimu- 
lation is withdrawn, and not the decay 
of specific motivation for the intra- 
cranial stimulus as the theory would 
demand. In order to meet this point,” 
the following experiment was devised 
(Deutsch, Adams, & Metzner’). Four 
rats with chronic electrodes implanted 
in the ventral tegmental region were 
taught a T maze with water in one 
goal box and intracranial stimulation 
in the other. They were given a 
choice between an intracranial stimu- 
lus and thirst. They were run at 0, 
5 and 22 hours of thirst and after 
varying times following the preceding 
intracranial stimulus. It was found 


3 Unpublished manuscript entitled ‘‘Proba- 
bility of Choice of Intracranial Stimulation 
as a Function of Interval between Intracranial 
Stimuli and Competing Drive’ by J. A. ~ 
Deutsch, D. W. Adams, and R. Metzner. 
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Fic. 6. Probability of choice of brain stimulation as a function of delay and competing 


drive. 


(Effect of delay since last brain stimulus on the probability of choice of brain stimula- 


tion as against water, when the rat is 0, 5, and 22 hours thirsty.) 


that the probability of choice of 
intracranial stimulation declined both 
as a function of conflicting thirst and 
time since the last intracranial stimu- 
lus. The criterion of drive decay here 
was not an activity measure, but rela- 
tive frequency of choice. Therefore, 
an explanation in terms of a simple 


level of arousal would be very difficult 
to sustain. 


SECOND SERIES OF EXPERIMENTS 


In our next set of experiments, we 
were concerned to prove that the 
motivational pathways stimulated by 
the electrodes were the ones which 
mediated normally motivated be- 
havior. Some relationship between 
self-stimulation behavior and normal 
drive and reinforcement has been 
shown by Olds (1958b) and Brady 
(1958), who showed an interaction 
between drive state and lever pressing 
rate. These experiments are open to 
a number of alternative explanations 
not simply related to normal motiva- 
tional mechanisms. The biochemical 


change during drive states might alter 
the electrical excitability of the tissue 
or could be increasing the reinforcing 
properties of the brain stimulus. On 
the present analysis, the Olds and 
Brady results could be explained in 
terms of a summation of motivational 
excitation from normal sources and 
from electrical brain stimulus, but 
none of the earlier experiments are 
adequate evidence for this idea. 

There are, however, two predictions 
from the theory of self-stimulation 
which, if confirmed, would provide 
good evidence for the theory and for 
the idea that stimulation of normal 
motivational pathways is involved. 

1, Animals which show extinction 
of a self-stimulation habit can be re- 
called to the habit by a brain stimulus. 
This “priming” effect of the brain 
stimulus is simply explained by the 
idea that the stimulus is acting on 
motivational pathways. If this idea 
is correct, we should be able to recall 
animals to the lever by increasing 
whatever normal drive is appropriate. 


Some Tests OF A THEORY OF INTRACRANIAL SELF-STIMULATION 453 


2. Since fast extinction seems to be 
due to the decay of motivational 
excitation, then extinction should be 
considerably prolonged in the presence 
of the appropriate normal motivation. 

In the next two experiments we 
attempted to test both of these 
predictions. 


Experiment IIa 


We used 10 albino Sprague-Dawley 
rats with bipolar electrodes in the 
lateral hypothalamus or ventral teg- 
mentum. After initial training, the 
rats were repeatedly withdrawn from 
the lever by hand and released in 
another part of the cage. In this 
way, the rats learned the position of 
the lever in relation to the rest of the 
cage. 

We then tried various ways of re- 
calling the rat to the lever by normal 
motivation. We had some success with 
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Evocation by fear of habit learned for intracranial reward. 


hunger in the case of rats with hypo- 
thalamic placements, but by far the 
most dramatic results were obtained 
from the tegmental animals in relation 
to fear. We found that after extinc- 
tion all the tegmental animals could 
be recalled to the lever by a frighten- 
ing stimulus, while the hypothalamic 
animals, in general, were not recalled 
(Deutsch & Howarth, 1962). Our 
procedure is illustrated in the cumu- 
lative record shown in Figure 7. We 
extinguished the response by switch- 
ing off the brain stimulus and remoy- 
ing the animals from the lever by 
hand. About 1 minute later, we 
sounded a loud buzzer in the cage for 
a half second. This was repeated 
three times at approximately 30- 
second intervals. Then, after a 
further pause, we returned the animal 
to the lever by hand and switched on 
the brain stimulus. Then, after about 
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cumulative record for Rat 26 showing an extinction period with three bursts of sound from the 
buzzer, extinction followed by three buzzes, and two extinction periods with three buzzer-shock 
combinations. Three control extinction periods recorded on the following day are shown on the 
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1 minute of reinforced lever ~ressing, 
we repeated the manual extinction 
and sounded the buzzer as before, but 
this time it was paired with an electric 
shock of 300 microamperes to the feet. 
Five of our animals had tegmental 
electrodes and four had hypothalamic 
electrodes, as verified by later his- 
tology; one of the animals had an 
electrode very far anterior, aimed at 
the septum, but the appropriate part 
of its brain was unfortunately de- 
stroyed during preparation for his- 
tology. The results are summarized 
in Figure 8. 

It is to be stressed that when the 
animals pressed after a recall stimu- 
lus, such pressing produced no brain 
stimulus, so that recall was not ‘‘re- 
inforced.” Of the five tegmental 
animals only one (Number 14) ap- 
peared frightened by the buzzer and 
ran to the lever and pressed it six 
times, the first time the buzzer was 
sounded, All the other tegmental 
animals ran to the lever the first time 
they got a .5 second shock to the 
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Fic. 8. Correlation between anatomical 
site and recall by fear. (Results of Experi- 
ment 2a. The abscissa gives the increased 
number of lever presses produced per recall 
stimulus. The ordinate gives the anterior 
posterior coordinate of the electrode on de 
Groot’s [1959] atlas.) 
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feet, and pressed a large number of 
times. Two of the hypothalamic 
animals pressed the lever a large 
number of times the first time they 
got the shock, but later experience of 
the shock had hardly any effect on 
them. There isa highly significant dif- 
ference between the behavior of the 
animals with tegmental electrodes and 
those with hypothalamic electrodes. 
Those hypothalamic animals which 
did recall somewhat to the lever were 
the ones with the most posterior elec- 
trode placements and there is a rank 
order correlation of 0.9 between elec- 
trode placement on the anterior- 
posterior axis and degree of recall. 
Fear did increase the rate of rein- 
forced lever pressing in some of our 
animals, but the effect was small and 
not statistically significant. 


Experiment ITb 


In the next experiment we were 
able to show that extinction was pro- 
longed in frightened tegmental ani- 
mals but far less affected by fear in 
hypothalamic animals (Deutsch & 
Howarth, 1962). We used Animals 
14, 15, 19, and 23 from the previous 
experiment. This time we frightened 
the animal with three more severe and 
prolonged presentations of the buzzer- 
shock combination. We then re- 
turned the animal to the lever for 
about a minute and then measured the 
number of trials and the time taken 
for normal extinction. For the teg- 
mental animals, the extinction was 
extremely prolonged. Figure 9 shows 
the data for one of these animals. 
Lever pressing continued for over 50 
minutes. We tried to alternate 
frightened and unfrightened extinc- 
tions but, as Figure 9 shows, the 
effect of the fear carried over to the 
next extinction period even when this 
occurred several hours later. Table 2 
summarizes the data for all four ani- 
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TIME. MINUTES RAT 15 


| following shock 


no shock 


ee ee oo 


no shock 
following shock 


following shock 


Fic. 9. Prolongation of extinction by fear. (Effect of successive previous experience of 
shock on prolonged extinction of the lever pressing habit. Each horizontal strip represents a 
50-minute period of extinction. Each bout of lever pressing is represented by a vertical line 
whose height is proportional to the number of lever presses. The horizontal “hook” at the 
top of some of these lines represents a continued holding down of the lever for a time given by 
the length of the horizontal line. Each 50-minute period was run on the morning or afternoon 


of 3 successive days.) 


mals but very much underestimates 
the effect in the tegmental animals 
because of this carry-over effect. 
Figure 9 shows some of the extreme 
qualitative differences between the 
animals. The tegmental animals, in 
this and in the previous experiment, 
tended to hold the lever down for long 


TABLE 2 


ELECTRODE PLACEMENT AND THE EFFECT 
OF FEAR ON LEVER PRESSING 
IN EXTINCTION 


Animal Ratio of lever presses* 
Tegmental 
14 3.5 
15 4.6 
Hypothalamic 
19 1.8 
23 0.98 


a Ratio of lever presses = 


periods when frightened. If removed 
from the lever in this state, they 
would immediately return to it and 
hold it down again. This behavior 
seemed a compromise between the 
“freezing” component of their normal 
frightened behavior and the learned 
habit to press the lever. With time, 
the animals seemed to become slightly 
less frightened and less ‘“‘frozen.” 
Then they often showed a marked in- 
crease in the lever pressing rate as 
much as 30 minutes after the begin- 
ning of extinction. Such behavior was 
never observed in the hypothalamic 
animals. 

To apply the present theory to this 
experiment, one has to postulate a 
primary link which is excited by 
afferent stimuli, such as pain, and 
stimuli which in the past have pre- 
ceded pain. This primary link must 
become connected to secondary links 
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whose analyzers are stimulated in 
association with a decrease of noxious 
stimulation. Then, in accordance 
with the rules already described, the 
cues which stimulate the analyzers of 
these secondary links will become 
attractive in the presence of noxious 
stimulation. 

In the present experiments, we must 
assume that our electrode was stimu- 
lating motivational pathways which 
can carry motivational excitation de- 
rived from noxious stimulation, caus- 
ing the animal to seek safety, and at 
the same time stimulating a reinforce- 
ment pathway which signals the 
decrease of noxious stimulation. 


THIRD SERIES OF EXPERIMENTS 


We now sought to demonstrate the 
existence of separate motivational and 
reinforcement systems. If there are 
separate systems, it seems likely that 
they are differentially sensitive to 
electrical stimulation. Preliminary 
observations indicated that the moti- 
vational pathways had a lower thresh- 
old than the reinforcement pathways. 
If a low intensity stimulus stimulated 
only the motivational pathway selec- 
tively, we would expect such stimula- 
tion to maintain an already learned 
response or at least slow down its 
extinction, even if the stimulus was 
given repetitively and independently 
of the bar pressing response. On the 
other hand, a high intensity repetitive 
stimulus administered independently 
of the response should cause the ani- 
mal to learn new “superstitious” 
habits and so, if anything, should 
hasten extinction. 


Experiment IIIa 


The first animal tested confirmed 
this prediction (Deutsch, Howarth, 
Ball, & Deutsch, 1962). The results 
of this experiment are shown in 
Figure 10. Low repetitive stimulation 
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Fic. 10. Effect of subthreshold and 


threshold-noncontingent stimulation on ex- 
tinction. (Successive periods of extinction 
under three conditions. Normal extinction— 
crosses—accompanied by a low intensity 
repetitive brain stimulus—circles—and_ ac- 
companied by repetitive stimulus of the in- 
tensity used in original training—solid circles.) 


slowed extinction while repetitive 
stimulation at the intensity used in 
the original training produced very 
fast extinction. Unfortunately, no 
other animal of the three tested be- 
haved in this way. We gave the repeti- 
tive stimulus at about the same fre- 
quency that the animal pressed the 
lever. As a result, we obtained a 
great deal of ‘‘superstitious’’ or chance 
reinforcement of the already estab- 
lished habit at the higher intensity. 
Although the result was contrary to 
our expectations on the two other 
animals, the reason for this was so 
obvious that we felt we could not 
treat it as a significant negative result. 
Therefore, we redesigned the experi- 
ment slightly. 


Experiment IIIb 


The reinforcing and motivating 
effects of a brain stimulus should be 
differently affected by the relative 
timing of response and stimulus. A 
reinforcing effect requires that the 
reward stimulus be contingent upon 
the response, while this is not neces- 
sary for a motivational effect. So we ’ 
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decided to compare the relative fre- 
quency of responding for contingent 
and noncontingent (repetitive) brain 
stimulation. If the stimulus has only 
a motivational effect at low intensities 
at 60 cps, then the repetitive stimula- 
tion should maintain the response 
better than the contingent brain 
stimulus, since it continues even after 
the animal has stopped responding 
and so could recall it back to the lever. 
If the higher intensity stimulus had 
both reinforcing and motivational 
properties, then the repetitive stimu- 
lus should be inferior to the contingent 
stimulus, since the animal should learn 
new interfering habits in a super- 
stitious way. 

The “high intensity” we used in this 
experiment (Deutsch et al., 1962) was 
the “threshold” intensity used in 
initial training; the “low” intensity 
was about half this. In the first half 
of the experiment 1-minute training 
trials were terminated alternately 
with 1-minute periods in which the 
animal received either contingent 
subthreshold stimulation or noncon- 
tingent subthreshold stimulation. Of 
the eight animals used (all of which 
had ventral tegmental electrodes), 
only one showed a lower mean number 


of responses in the noncontingent 
condition. Of these animals, five were 
also tested with threshold contingent 
and noncontingent stimulation. All 
of these showed a lower rate of re- 
sponding in the noncontingent con- 
dition. The data and the relevant 
statistics are summarized in Table 3. 
The one reverse result in the low in- 
tensity groups is not too surprising. 
The range between the motivational 
and reinforcement thresholds is 
probably quite small and for this 
animal we did not hit it. 

The notion that motivational and 
reinforcement pathways have differ- 
ent thresholds is made plausible by 
other evidence. Margules and Olds 
(1962) have shown that learning for 
electrical self-stimulation is only elicit- 
able from certain areas in the hy- 
pothalamus which at lower thresholds 
produce hungry behaviour. It has 
also been shown that hunger increases 
the tendency to perform a habit under 
electrical self-stimulation of such areas 
(e.g., Hoebel & Tetelbaum, 1962). It 
would seem therefore that the moti- 
vational pathways subserving hunger 
are being stimulated to produce per- 
formance of Electrical Stimulation of 
the Brain habits and that these path- 


TABLE 3 


EFFECT OF Two LEVELS OF CONTINGENT AND NONCONTINGENT 
STIMULATION ON RESPONDING 


[ns =i SE) SESE SAC MEME LS a ae 
2 NP Microamperes 
Subject ati aN O N difference rk ish ai difference ae i 
Low High 
1 69 93 .03 363 190 -001 40 100 
2 70 84 30 = — 2a 35 75 
3 26 82 -0001 261 226 20 100 175 
4 21. 51 .025 318 19 0001 150 300 
s 75 48 05" 100 225 
6 40 60 14 > = = 55 100 
7 51 69 -02 1,283 153 0001 110 175 
8 S7 69 50 1,312 470 0001 35 80 
Total 452 556 3,537 1,058 


a Significant difference in opposite direction to prediction, 
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ways do have a lower threshold at 60 
cps because hunger alone can be elicit- 
ed through their stimulation at a 
lower intensity. 


Experiment IIIc 


It was reasoned (Deutsch & Stifler*) 
that if there were two separate path- 
ways their maxima of excitability 
would lie in different portions of the 
stimulus frequency spectrum. Conse- 
quently, two stimuli at different fre- 
quencies, equated for reward value, 
should produce unequal motivation. 
Accordingly, two frequencies of stimu- 
lation (60 cps and 2,000 cps) were 
each used as reward in the two goal 
boxes of a T maze, and their intensity 
so adjusted that a rat chose each 
equally often. Running tests to the 
two stimuli presented independently 
demonstrated a much slower running 
speed to the 2,000 cps stimulus than 
to the 60 cps stimulus, both at the 
previously adjusted intensities. Be- 
cause of the magnitude of the effect, 
a more stringent test was arranged, 
The two stimulating frequencies (60 
cps and 2,000 cps) were now adjusted 
until a moderate preference for the 
2,000 cps appeared; then the sides on 
which the two stimuli were presented 
were reversed. The number of choices 
of each stimulus in the 24 trials before 
reversal and the 24 immediately after 
reversal were taken as a preference 
score. After this part of the experi- 
ment was completed, running speeds 
to each of the stimuli were tested over 
25 trials when the stimuli were singly 
presented in the maze. Out of the 
five animals tested, all showed a lower 
running speed to the preferred 2,000 
cps stimulus. Though on the average 
the animals ran to the 2,000 cps side 


4 Unpublished manuscript entitled “Dif- 
ferential Frequency Sensitivity of Motiva- 
tional and Reinforcement Systems” by J. A. 
Deutsch and R. B. Stifler. 
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30 trials as against 18 to the 60 cps 
side, the average running speed to 
2,000 cps was 7.6 seconds and to the 
60 cps, 3.6 seconds. Though the 
animals behaved in a thoroughly 
normal way after the 2,000 cps stimu- 
lus, the intensity of this stimulus was 
increased to check if the lower running 
speed was due to some aftereffect of 
2,000 cps, such as a mild seizure. It 
was found that the 2,000 cps stimulus 
at an increased intensity produced 
increased running speeds up to the 
maximum of which the animal was 
capable. This renders an explanation 
in terms of some disabling aftereffect 
most unlikely. It seems then that the 
drive, as measured by running speed 
induced by an electrical stimulus, can 
vary independently of the reward 
value of such a stimulus. This makes 
it plausible that two systems, such as 
those postulated by the theory, are 
indeed involved. 


DISCUSSION 


Though the experiments which have 
been conducted to test the theory 
have proved favorable, the question 
will be asked whether evidence ob- 
tained by others creates no difficulty 
for the theoretical interpretation es- 
poused above. In answer to this 
question, it should be said that no 
directly contradictory evidence seems 
to be available. However, there are 
experiments which indicate that 
certain areas of investigation might 
turn up such evidence. In particular, 
Stein’s (1958) work on secondary re- 
inforcement indicates that under con- 
ditions which have not yet been ade- 
quately defined, stimuli associated 
with intracranial stimulation can 
acquire secondarily reinforcing prop- 
erties. It seems that animals will 
work for stimuli previously associated 
with intracranial stimulation without 
priming. Although such an effect is 
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small, the present theory would have 
to assume that the electrodes in the 
case of the animals that show such a 
phenomenon are in motivational path- 
ways which are active when the 
secondary reinforcer is sought out. 
A second difficulty concerns the varia- 
tion of threshold of intracranial stimu- 
lation and the rate of response with 
drive state. Olds (1958b) reports that 
in the case of some animals such effects 
occur only over a narrow range of 
intensities. Though the theory would 
not, on the face of it, predict such a 
phenomenon, again further examina- 
tion of the issue would be necessary 
before such evidence could be held to 
be contradictory. Somewhat in a 
different way, the data of Roberts 
(1958) and Bower and Miller (1958) 
on placements which yield reinforce- 
ment and avoidance cannot be ex- 
plained by the theory. However, if 
such effects are regarded as being due 
to the neighborhood of two areas with 
different functions, the theory, not 
being anatomical, would not reasona- 
bly be expected to predict here, any 
more than it could predict the strange 
assortment of forced movements and 
other reactions connected in a seem- 
ingly haphazard manner with intra- 
cranial self-stimulation depending on 
electrode placement. 

Though it may seem that the theory 
here put forward to explain phe- 
nomena of learning under electrical 
stimulation of the brain makes im- 
plausible assumptions of an ad hoc 
nature, it should be stressed that these 
assumptions were not invented for the 
purpose of explaining these data. 
They are already parts of a coherent 
larger theory put forward to account 
for evidence in other fields of experi- 
mental and physiological psychology. 
This theory has been able to explain 
many of the features of eating and 
drinking behavior and to predict a 


paradoxical effect in drinking prefer- 
ence (Deutsch & Jones, 1960). Seem- 
ingly contradictory evidence on drive 
discrimination has been acounted for 
and the explanation confirmed by 
further experimentation (Deutsch, 
1959). Goal alternation was pre- 
dicted as were certain improbable 
outcomes of experiments on excitation 
(Deutsch & Anthony, 1958; Deutsch 
& Clarkson, 1959b). Further, it was 
possible to apply the theory to reason- 
ing in rats and to test this interpreta- 
tion experimentally (Deutsch & 
Clarkson, 1959a). The present series 
of experiments was designed to test 
the physiological soundness of some 
of the assumptions concerning moti- 
vation and reinforcement which the 
theory makes. 


AUTHOR’S NOTE 


Evidence that will make it possible to look 
for the two postulated systems at a histologi- 
cal level has recently been found by the senior 
author. It has been possible to measure the 
length of the neural refractory period of the 
pathways whose excitation produces the phe- 
nomena of intracranial self-stimulation by 
behavioral means. Stimulation by equal pulse 
pairs (monophasic, .1 millisecond duration), 
repeating at 10 millisecond intervals, is used. 
The interval between the two pulses in each 
pair is varied in steps of .1 millisecond. It is 
found that running speed varies considerably 
as a function of this interval, as does reward 
value as measured by preference. For pulse 
pair intervals of .1 to .2 millisecond, the 
period of latent addition is observed. The 
effect of the second pulse gradually diminishes 
until from .2 to .5 millisecond after the be- 
ginning of the first pulse the second pulse is 


-entirely occluded. From then until .5 to .6 


millisecond, stimulation with two pulses is 
equivalent to stimulation with only one of 
them. Results so far indicate that “reinforce- 
ment” pathways, as measured by preference, 
differ in their parameters from the ‘motiva- 
tional” pathways, having a shorter refractory 
period and differing also in the period of latent 
addition. Through significant differences be- 
tween the reinforcement and motivational 
parameters have been obtained in each of the 
few animals so far tested, the methods used 
do not give a precise numerical estimate. So 
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far, it can be stated with confidence that the 
occlusion of a second pulse at certain intervals 
occurs, at times consistent with the belief that 
this is due to the neural refractory period, so 
that we can measure the refractory period of 
the fibers involved by using purely behavioral 
means. It also seems probable from the re- 
sults so far obtained that the refractory period 
for the reinforcement effects induced is shorter 
than for the motivational effects. The length 
of refractory period correlates well with fiber 
diameter and so fibers producing reinforce- 
ment effects are probably larger than those 
concerned with motivation. 
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PERCEPTION OF 


LOW AUDITORY PITCH: 


A MULTICUE, MEDIATION THEORY? 


WILLARD R. THURLOW 


University of Wisconsin 


Many Ss can hear a low pitch to certain complex auditory stimuli even 
when there is no physical frequency corresponding to this low pitch. 
The hypothesis is advanced that in the majority of cases, the low pitch 
is related to a mediating vocal response. Experiments were performed 
which support the mediation hypothesis. Further experiments with 
trains of pulses lead to the conclusion that there is no precise time 
analyzing system for determining low pitch. 


It has long been puzzling to audi- 
tory theorists why it is that subjects 
can hear a low pitch to certain com- 
plex auditory stimuli even when there 
is no physical frequency corresponding 
to this low pitch. (For discussion of 
previous work in this area, see Garner, 
1952; Jeffress & Moushegian, 1959; 
Licklider, 1959; Pollack, 1961.) It 
will be the purpose of this paper to 
review some recent evidence concern- 
ing this problem, and to show how a 
mediation hypothesis appears to offer 
a solution. 


Cuers To Low PITCH 
Role of Spectrum (Overtone Structure) 


An illustration of spectrum, or over- 
tone structure (a sample 1,400 cycles 
per second through 3,800 cycles per 
second), and possible cues it contains 
of importance for perception of low 


1 This research was supported by a grant 
from the Research Committee of the Graduate 
School of the University of Wisconsin, from 
funds provided by the Wisconsin Alumni 
Research Foundation, I am also indebted to 
the Research Committee of the Graduate 
School of the University of Wisconsin for a 
research leave granted to work on these 
problems. The present review was prepared 
in connection with the writer's appointment 
as an American Psychological Association— 
National Science Foundation Visiting Scient- 
ist, 1962. 


pitch, is given in Figure 2. This 
spectrum has frequency components 
(harmonics) at multiples of 100 cycles 
per second (the “fundamental’’), and 
these components may act as cues, 
In addition, there is a pattern of 
maxima and minima in the spectrum 
lines (height of line for each com- 
ponent indicates strength of that 
component) ; or we could say that the 
“envelope” of the spectrum lines 
shows peaks. The locations of the 
maxima in this envelope may act 
as a cue. 

Licklider (1959) has reviewed ear- 
lier work on the problem of whether 
the frequency components in a spec- 
trum (harmonics) can act as a cue to 
low pitch. He favors the interpreta- 
tion that they can do so only when the 
phases of the components are such 
that pulses are formed; the pitch then 
corresponds to the pulse rate. On the 
other hand, Flanagan and Guttman 
(1960a, 1960b) separated the vari- 
ables of pulse rate and fundamental 
frequency (by manipulating polarity 
of pulses), and found that for stimuli 
with fundamental frequencies in ap- 
proximately the range 200-500 cycles 
per second, matching of pitch takes 
place corresponding to the funda- 
mental frequency (whether physically 
present or not). 
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Another type of spectral cue, which 
we mentioned earlier, may come from 
the envelope of the spectrum (as 
illustrated in Figure 2). Results of 
experiments by Thurlow and Small 
(1955) can be best interpreted from 
the point of view that the pitches 
heard were cued by the envelope 
pattern of the stimuli. In these ex- 
periments, double pulse trains were 
used. Figure 1 illustrates such a 
stimulus. Pulses from pulse genera- 
tors (.1 millisecond duration) were led 
through a filter (in the illustration, 
UTC Model 4C, set at 1,000 cycles 
per second band-pass). Pulses x and 
x’ represent two pulses (of many) of a 
repeated train of pulses: Pulses y and 
y’ represent two pulses (of many) of 
another repeated train. The basic 
pulse rate of each train was held 
constant (Interval x’ — x = y’ — y). 
(In Figure 1 the basic pulse rate was 
50 per second; in these experiments 
the basic pulse rate was most often 
set at 100 pulses per second.) The 
interval y — x = y' — x’, however, 
was varied. As the interval was 
shortened, our subjects heard a pitch 
(which we have called a “time differ- 
ence” pitch or “sweep” pitch) which 
was correlated with this time interval, 
At the time we ran these experiments 
we thought that the pitches were the 
result of the action of a time analyzing 
system (though we were puzzled as to 
why there was no pitch corresponding 
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Fic. 1. Stimuli produced by leading .1 
millisecond pulses through a 1,000 cycles per 
second band-pass filter. (Pulses x and x’ 
represent two pulses [of many] of a repeated 
train; y and y’ represent two pulses of another 
repeated train, The interyal between pulses 
of the two trains [y —x =y —x'] is 
varied.) 
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Fic. 2. Illustration of a portion of the 
spectrum of a double pulse train stimulus 
(unfiltered) where the y — x = y’ — x’ in- 
terval is close to 1.1 milliseconds (and where 
individual pulses are .1 millisecond in dura- 
tion). (Component lines represent relative 
strength of frequency components—1,400 
through 3,800 cycles per second spaced at 
intervals of 100 cycles per second, as recorded 
by a wave analyzer.) 


to the x’ — y time interval). Now 
there is evidence against the existence 
of such a time analyzing system, 
which I shall review later in this 
report. Therefore I interpret the 
pitches heard as being the response to 
the pattern in the spectrum (see also 
Thurlow, 1957). There is a correla- 
tion between the pitch heard and the 
spectrum pattern, for stimuli we have 
used. If the y — x = y' — x’ interval 
is 2 milliseconds, for instance, the 
peaks in the envelope tend to be at 
multiples of 500 cycles per second, and 
our subjects heard a pitch which 
could be matched to one near 500 
cycles per second. We can note a 
possible relation of pitch perceived to 
the fundamental of a vocal tone which 
could be sung by the subject, the 
harmonics of the vocal tone corre- 
sponding to the peaks in the envelope 
of the pattern. For fundamentals 
corresponding to vocal tones higher 
than the person can sing, the singer’s 
vocal response could have a funda- 
mental one or more octaves below 
that of the harmonic pattern he was 
matching. Singers often have a good 
deal of experience in matching voices 
and instruments one or several octaves 
away. If time difference pitches are 
matched successively from low to 
high, with an oscillator tone that can 
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be varied from low to high, the subject 
can keep track of the number of 
octaves he has covered by the number 
of times he has to transpose his voice 
down an octave. 

It should be emphasized that our 
discussion so far concerning time 
difference pitches refers to perceptions 
of certain musically trained people. 
In a study by Thurlow and Hartman 
(1959) it was found that just as only a 
certain number of people with musical 
training hear the “missing funda- 
mental,” so only a certain number 
hear the time difference pitches ac- 
curately. Jenkins (1961) has reported 
that those of his subjects who could 
match any pitch at all in the double 
pulse-train stimuli (similar to those 
we have been describing) heard a 
pitch corresponding to the basic pulse 
rate when pulses y and y’ were close to 
x and x’, and a pitch an octave higher 
more prominently as interval x’ — y 
approached equality with y— x. 
These were pitches which we reported 
(Thurlow & Small, 1955) were heard 
most clearly by our subjects when 
high-frequency filtered pulses were 
used. We will discuss these latter 
pitches later in this paper when 
talking about intermittency cues. 
When lower frequencies are included 
in the stimulus, the low frequency 
harmonics give a cue for a low pitch 
corresponding to the fundamental of 
these harmonics. There is no question 
but that some subjects can hear only 
this latter pitch, or one related to the 
intermittency. We have recently run 
experiments with two samples of 16 
subjects each, in which the subjects 
tried to sing a match to the pitch they 
heard in stimuli similar to those 
illustrated in Figure 1. Many sub- 
jects heard only a low pitch correlated 
with the fundamental of the har- 
monics, or with the intermittency. 
On the other hand some subjects 
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heard pitches corresponding to the 
pattern in the envelope of the har- 
monics, which we have called time 
difference pitches. It is of interest to 
note that Meyer (1962) has recently 
rediscovered the time difference 
pitches, and has no qualifications in 
classifying them as pitches. 


A Mediation Hypothesis 


Other investigators have noted the 
existence of individual differences in 
whether pitch is perceived in the 
missing fundamental situation, in 
response to pulses of noise or tone. 
(See, for example, Davis, Silverman, 
& McAuliffe, 1951; Small, 1955.) 
The problem is what causes one 
person to hear a low pitch and another 
not to. 

In the Thurlow and Hartman 
(1959) study, it was noted that some 
of the subjects who matched the 
complex stimuli to low pure tones 
were using a vocal humming reaction 
as an intermediate response. This 
suggested the hypothesis that the 
matching was a response-mediated 
type of equivalence. Mediation by an 
overt humming response can be 
labeled “overt humming mediation.” 

Mediated equivalence of cues has 
been of interest to learning psycho- 
logists for a long time (cf. Kimble, 
1961, pp. 355-358). However, there 
are special features to this mediation 
in some subjects of our experiments, 
These subjects report that they “hear”’ 
a low pitch in these complex stimuli, 
and they hear this without making an 
overt humming or singing reaction. 
It seems possible, therefore, that the 
mediating reaction can be one which 
includes subvocal muscular reactions 
in the vocal cords, with associated 
kinesthetic feedback; and also some 
auditory imagery associated in the 
past with making the overt singing 
reaction. Many musically trained 
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subjects can hear themselves sing 

“without actually singing ; this could be 
considered related to “conditioned 
sensations” (cf. Hefferline & Perera, 
1963; Leuba, 1940). Mediation of 
the type just hypothesized can be 
labeled “implicit humming media- 
tion.” The tendency to “externalize” 
these reactions, and to perceive them 
as part of the stimulus is undoubtedly 
related to a tendency to externalize 
noted, for instance, in visual per- 
ception. 

This point of view, while denying 
that the perception is relatable simply 
to stimulus properties (cf. Gibson, 
1959), nevertheless indicates some 
reasons why the perception may not 
(to the subject) appear to involve an 
arbitrary mediating reaction. The 
close link between “response” and 
“perception” in such cases would 
make it difficult to separate perception 
and response (cf. Garner, Hake, & 
Eriksen, 1956) unless one were able 
to go back in the history of the 
organism to trace the development of 
these perceptual responses. On the 
other hand, we should like to avoid 
suggesting that the perception consists 
only of a cued mediating response, 
There are other aspects of the percep- 
tion (roughness, buzzing, for instance) 
which can be related to the stimulus, 
pointing to the existence of other 
additional internal events. 

The present “mediation” hypothesis 
is related to that which Liberman 
(1957) felt it necessary to propose to 
understand why certain consonant 
sounds (such as g, followed by various 
vowels) sounded the same even though 
the acoustical stimulus changed mark- 
edly. Liberman hypothesized that 
the similarity of perceived sounds was 
due to the fact that a common arti- 
culatory response had been made 
many times to them. He felt it 
necessary to postulate a neural rep- 
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resentation of such responses, how- 
ever, which could be utilized very 
rapidly in determining perception of 
the sound. 

The role of vocal response in pitch 
discriminations has been recognized 
for some time. Wyatt (1945) has 
reviewed early work in this area. 
Training subjects by having them 
give a vocal response to auditory 
stimuli aids them in discriminating 
the stimuli. Extensive data obtained 
in Russia, in studies directed by 
Leonteyev (see review by Pick, 1960), 
show that training with a nonvocal 
motor response, as well as with a 
vocal motor response, aids in dis- 
crimination. 

As a first step in testing the 
mediation hypothesis which we have 
proposed, we decided to set up an 
experiment in which we might demon- 
strate a “mediated equivalence” of 
pitch depending on a learned vocal 
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Method 


Procedure. A group of 18 undergraduates, 
all with musical training, were randomly 
assigned to each of two conditions, I and II 
(9 to each condition). The subjects took part 
in a test session, two practice sessions, and 
then a final test session, 

„Tests. A “Buzz-tone” test (33 items) was 
given before and after the practice sessions. 
Double pulse-train stimuli were used, as 
illustrated in Figure 1 (and described in an 
earlier section). The basic pulse rate of xx’ 
and yy’ pulses was 50 per second. (Thus the 
xx’ interval, for instance, was 20 milliseconds.) 
Pulses (.1 millisecond duration) were pro- 
duced by Tektronix pulse generators, and 
filtered by passing them through a UTC band- 
Pass filter (Model 4 C) set at 1,000 cycles per 
second. For each item in the test, the pulse- 
train stimuli were turned on with a separation 
between x and y pulses of 10 milliseconds; 
this interval was then shortened to 8 milli- 
seconds, and this stimulus left on for 1 second. 
(This first stimulus was only to call attention 
to the time difference pitch by changing it.) 


*Bhalchandra Bhatt assisted in carrying 
out the experiment, 
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Then, after a second of silence, the double 
pulse train was turned on again for a second 
with one of the y — x separations: 7,2, 7.6, 
8,0, 8.5, or 8.9 milliseconds. Finally, after 
another second of silence, a pure tone was 
turned on for a second at a frequency of 111, 
118, 132, or 140 cycles per second. The 
subject was asked to say whether the pure 
tone was the same, higher, or lower, than the 
pitch of the second buzzing stimulus. Items 
were recorded on tape (Magnecorder PT6-J). 
The signals to be presented to the subject 
were led from the tape recorder, through an 
attenuator and transformer to a small loud- 
speaker placed in a room, the ceiling and walls 
of which were lined with Fiberglas wedges. 
A sensation level close to 45 decibels was used. 

A “Tone-voice” test was also given to all 
the subjects (25 items). This test was con- 
structed entirely analogously to the Buzz-tone 
test, but a pure tone was substituted for the 
buzz, and a male voice was substituted for the 
tonal stimulus. The purpose of this test was 
to find whether the subjects could match a 
low tone to a low voice (singing the same 
fundamental as the low tone). All subjects 
scored significantly better than chance on 
this test. The Tone-voice test was given to 
the subjects before the Buzz-tone test. 

Practice. A practice set of stimuli (25 
items) was made up with double pulse-train 
stimuli similar to those in the Buzz-tone test; 
but in this set, no pure tone occurred. Con- 
dition I: The subjects were asked to try to 
sing a pitch the same as that of the second 
buzzing stimulus. Close to 5 seconds after 
the second buzzing stimulus on the tape, a 
voice sang a pitch corresponding to the time 
difference pitch of the second buzzing stimu- 
lus. The subjects could compare their sung 
note with that given on the tape. Condition 
IT: The subjects listened to the same tape, 
with instructions to listen carefully (being 
told they would make use of this information 
later); but they did not try to sing a match 
to the second buzzing stimulus. The “prac- 
tice” tape was presented three times during 
each practice session. 


Results 


Statistical tests (¢ tests) run on the 
groups of Conditions I and II, showed 
that only the subjects of Condition I 
(who had practiced singing a match) 
had made a significant improvement 
(beyond .05 level of significance) in the 
Buzz-tone test. Five of the subjects 
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in Condition I scored above chance 
in the Buzz-tone test after training 
(but not before). One subject in 
Condition II scored above chance 
both before and after the practice 
session, but made no significant 
improvement. It is concluded that 
training in the use of a mediating 
vocal response produces a significant 
tendency for mediated equivalence of 
the pitch of pulse and pure tone 
stimuli, 


Role of Intermittency 


Licklider (1959) has reviewed num- 
erous experiments showing that low 
pitch may be perceived to intermittent 
stimuli. He notes that Cramer (1955) 
has found a pitch correlated with 
stimulus intermittency, independent 
of pitch due to momentary 
cues (which may be present with 
bursts of thermal noise). Mathes 
and Miller (1947), Small (1955), and 
Flanagan and Guttman (1960a, 1960b) 
have reported production of pitch 
correlated with stimulus intermit- 
tency, not confounded with spectrum 
cues. Miller and Taylor (1948) 
found that their subjects could match 
the pitch of intermittent bursts of 
thermal noise to a pure tone up to 
approximately 200 bursts per second, 

In a previous experiment (Thurlow 
& Small, 1955), we found that using 
sharp high-frequency band-pass filter- 
ing of double pulse trains (with two 
UTC Model 4-C filters in series, each 
set at 5,000 cycles per second band- 
pass) led to distinct differences in 
pitch perception (from when a single 
filter was used). Figure 1 can again 
be used to illustrate the double pulse- 
train situation even though the filter- 
ing was different. It was o 7 
that as pulse y (and others of its train) 
was moved with respect to x (and 
others of its train) the subjects heard 
the pitch jump to an octave above 
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the basic pulse rate when it was ap- 
proximately within the middle third 
of the xx’ interval. A pitch corre- 
sponding to the basic pulse rate was 
otherwise heard (except for a short 
transition region). These results on 
the “jump to the octave” are similar 
to those Békésy (1961b) has recently 
found for perception of skin vibration. 
The results can be interpreted as 
meaning that perceived intermittency 
can be a cue to pitch. The results are 
probably different from the Seebeck 
effect (see Békésy, 1961b) because the 
lower frequency part of the spectrum, 
with its pattern of separate harmonics, 
(which can act as a cue to low pitch) 
is eliminated in our experiment. The 
changes in pitch heard cannot be 
related to spectral cues. Components 
in the spectrum are spaced at mul- 
tiples of the basic pulse rate, and could 
not account for the jump to the 
octave; any patterning present in the 
envelope of the harmonics would be 
correlated with the time difference 
pitches, not with the jump to the 
octave, These original subjects (who 
had considerable musical training) 
could match the perceived pitch 
(correlated with perceived intermit- 
tency) to a low pure tone. 

Individual differences, as might be 
expected, turn out to be important 
also in perception of these pitches. 
An experiment was carried out by the 
author recently with 22 subjects (10 
male and 12 female) with stimuli 
similar to those described above. 
This experiment was performed not 
only to investigate individual differ- 
ences in reactions to these intermit- 
tent stimuli; but also to test reactions 
to these stimuli when a substantial 
masking stimulus was added to the 
low frequency region, since Békésy 
has pointed to the possible existence 
in the inner ear of low frequency 
“residues” caused by intermittency 
(Békésy, 1961a). 


Method 


Procedure. A double train of pulses (each 
.1 millisecond in duration, from Tektronix 
pulse generators) was sent through two 5,000 
cycles-per-second band-pass filters (UTC 
model 4C) and set at 20 decibels sensation 
level in the subject's earphone (Telephonic, 
TDH-39). Each pulse train was set at 100 
per second ; thus the xx’ Interval (see Fig. 1), 
for instance, was 10 milliseconds. (An earlier 
experiment with 18 male subjects with very 
similar stimuli showed that all but 2 subjects 
could discriminate the intermittencies. The 
faster perceived intermittency for these 
subjects corresponded to a separation of 4 
milliseconds between pulse trains (y — x in- 
terval), as compared to a 1 millisecond 
separation.) A masking stimulus (thermal 
noise, filtered so as to pass frequencies below 
500 cycles per second) was set at 50 decibels 
sensation level in the subject’s earphone. 

The subjects were asked to match the pulse 
stimulus to a pure tone from an oscillator, 
which was switched on alternately with the 
pulse stimulus. Four matches were made 
for each pulse separation condition. 


Results 


Among the male subjects, two 
discriminated perfectly even with this 
short series of matches. Median fre- 
quency matched by these two subjects 
to the 1 millisecond separation was 
102 cycles per second, and to the 4.5- 
milliseconds separation was 199 cycles 
per second. Another male subject 
(who had much musical training, and 
reported using “mental humming” to 
aid in making matches) mentioned 
that the pitch for the 4.5-milliseconds 
separation sounded an octave above 
that for the 1-millisecond separation. 
The results of this subject indicate the 
existence of what we have labeled 
“implicit humming mediation.” The 
median matches for this subject were 
91 cycles per second and 180 cycles per 
second for the 4.5- and 1-millisecond 
separations, respectively. Two of his 
tone settings were an octave above 
the others,—a type of “octave error” 
noted previously (Davis, Silverman, 
& McAuliffe, 1951; Thurlow & Hart- 
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man, 1959). Other male subjects did 
not show significant discrimination. 
However, all but one subject gave 
pure tone matches in the low fre- 
quency region. 

Among female subjects, 11 out of 12 
gave a median match to the 1-milli- 
second separation condition which 
was lower than to the 4.5-milliseconds 
condition. (This result is statistically 
significant beyond the .01 level.) The 
median match for all 12 subjects to the 
1-millisecond separation condition was 
213 cycles per second, and to the 4.5 
separation condition was 429 cycles 
per second. 

A mediation hypothesis again seems 
to be useful in explaining these results. 
Individuals with much training in 
singing can respond with a vocal 
response (or a subvocal response) to 
certain types of intermittent stimuli. 
In their musical experience, they have 
often sung matching notes to voices 
and instruments which have a notice- 
able intermittency in their low notes. 
They have also sung matches to non- 
intermittent stimuli with a low prom- 
inent fundamental. Thus the vocal 
response can act as a response mediat- 
ing the perceived similarity. We 
found that the majority of subjects 
in this experiment reported using 
overt or mental humming. However, 
there may be other possible mediating 
responses. For instance, one female 
subject who matched without any 
humming, reported that she was using 
a comparison in terms of “highness” 
or “lowness.” 


Role of Time Cues 


In this section evidence will be 
discussed which argues against the 
existence of a precise time analyzing 
system for low pitch. This evidence 
is discussed because it makes some 
other hypothesis such as the media- 
tion hypothesis necessary to explain 
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the various kinds of low pitch per- 
ception. 

We have earlier pointed out (Thur- 
low, 1958) one way for testing whether 
a precise auditory time analyzing 
system exists. We will now describe 
a more extensive test using these same 
ideas. 


Method 


Procedure. A group of 16 female subjects 
was selected. (Female subjects in the popula- 
tion from which we obtain volunteers usually 
have considerably more musical training than 
males, and could be expected to make tonal 
discriminations more precisely.) These sub- 
jects were tested on pure tone frequency DL 
at 125 cycle per seconds (which involves a 
separation between neural volleys of 8 milli- 
seconds), and also, at the same time, tested 
with pulses for discrimination of a change in 
timing of nerve impulses (from a standard 
separation of 8 milliseconds). Pulse stimuli 
used were produced in the same way as 
described previously (see Figure 1). The 
basic pulse rate of xx’ and yy’ pulses was 50 
per second. The interval y — x = y' — x 
was again set at 8 milliseconds. A single 
1,000 cycles-per-second band-pass filter was 
used for the pulses (UTC Model 4-C). 

For testing with the pulse stimuli, first one 
of the pulse trains was reversed in polarity, 
and then the other—the first being returned 
to its original polarity. (With reversal in 
polarity, a downward deflection in Figure 1 
becomes an upward one, and vice versa.) 
Physiological evidence (see Stevens & Davis, 
1938, p. 389) indicates that reversing a (low 
frequency) pulse results in a change in time 
of firing of the nerve impulse of one-half a 
period (due to the fact that the nerve firing 
occurs only once per cycle, corresponding to 
the time when the stapes is starting to move 
outward). We checked this inference by 
putting a probe pulse in the other ear, and 
using localization centering in the median 
plane as a measure of pulse coincidence in 
time (cf. Guttman, Bergeijk, & David, 1960). 
We found that reversing ‘either x or y pulse 
resulted in a delay of .5 millisecond in the 
neural pulses corresponding to the stimulus 
pulse. It should be noted that delaying the y 
neural pulse by .5 millisecond increases the 
time interval between x and y neural pulses; 
while delaying the x neural pulse by .5 milli- 
second decreases the time interval between 
x and y neural pulses. Shifting from one of 
these conditions to the other therefore results 
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in a change of 1 millisecond in time separation 
between x and y neural pulses. If there is a 
precise time analyzer for pitch, the subjects 
should be able to hear this change. 

Twenty items were presented which in- 
volved comparison of the pitch when first one 
of the pulse trains was reversed in polarity, 
and then the other. In half of the items, a 
given pulse train was reversed first; in the 
other half of the items this pulse train was 
reversed second. Each stimulus was on for .4 
second, separated by a silent interval of 1.3 
seconds, Interspersed with these items were 
60 items to measure the frequency DL at 125 
cycles per second for the same subjects. 


Results 


Results showed that none of the 10 
subjects could discriminate the change 
in timing of nerve impulses (rep- 
resenting over a 10% change in time 
separation). We therefore conclude 
that there is no precise time analyzing 
system for low pitch. On the other 
hand, 5 of the subjects could dis- 
criminate changes of less than 2 cycles 
per second in the 125 cycles-per- 
second tone; and 9 could discriminate 
changes of 3 cycles per second or less, 
We conclude that this precise dis- 
crimination must somehow be related 
to shift in place of stimulation in the 
nervous system, rather than being 
dependent on a precise time analyzing 
system. (We can also conclude that 
perception of time difference or sweep 
pitches is not dependent on time 
analysis, since some subjects can de- 
tect a pitch change in these situations 
when the time difference is changed 
considerably less than 10%.) 

In addition to the results just 
described, there is other evidence 
which argues against the existence of 
a precise time analyzing mechanism 
for low pitch. Some subjects cannot 
match the pitch of a sine wave 
stimulus to that of an intermittent 
noise stimulus unless the sine wave 

stimulus is so low that it is perceived 
as intermittent. (See for instance, 
Mowbray, Gebhard, & Byham, 1956; 
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Thurlow & Hartman, 1959.) Bursts 
of noise produce neural volleys in the 
auditory nerve up to very high rates 
(Peake, Goldstein, & Kiang, 1962), 
— just as well as, or better than, low 
frequency sine waves. If timing of 
volleys was the cue to pitch of low 
pure tones, it should be possible for 
all subjects to match rate of noise 
pulses to tone frequency of low fre- 
quency sine waves. Since all subjects 
cannot do this, one can conclude that 
the pitch of pure tone stimuli above 
approximately 60-70 cycles per second 
is determined by another cue—pre- 
sumably a place cue. 

To the extent that some subjects 
hear a pitch cued by perceived inter- 
mittency, it may be argued that some 
sort of crude time analyzing system 
enters in indirectly to determine cer- 
tain low pitches. Goldstein, Kiang, 
and Brown (1959) have pointed out 
the similarity of the upper limit of 200 
per sec. found by Miller and Taylor 
(1948 ;—for matching of thermal noise 
bursts to pure tones) to the upper 
limit for synchronized following by 
cortical responses to input pulses of 
thermal noise. Perhaps, then, the cue 
for pitch from intermittency is related 
to intermittency in cortical response. 
The neural mechanism by which this 
intermittency is utilized as a cue is 
not understood at present. From our 
results obtained with double pulse 
trains, one would conclude that the 
mechanism is not simple. We have 
used the term “perceived intermit- 
tency,” not to imply the necessity for 
a mysterious perceiver, but only to 
imply that certain stimuli which 
produce a perceived intermittency can 
also act as cues to low pitch. Not all 
stimuli which produce perceived inter- 
mittency necessarily would produce a 
musical pitch. A very important 
upper limit in our theory would be the 
upper limit of perceived intermittency 
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in voices and instruments which have 
come to produce a mediating response, 
such as singing, in the subject. 


SUMMARY AND CONCLUSIONS 


Some subjects can hear a low pitch 
to certain complex auditory stimuli 
even when there is no physical fre- 
quency corresponding to this low 
pitch. It has been the hypothesis 
of the present paper that the perceived 
low pitch is very often related to past 
learning of vocal matching responses 
to various musical stimuli—voices, 
and instruments. The subjects with 
extensive vocal experience, when 
asked to judge the pitch of a complex 
stimulus, tend to automatically make 
overt humming responses, or to “hear 
themselves” hum subvocally. The 
pitch of their hum then is attributed 
to the complex sound as its pitch. 

As a first step in proving the 
reasonableness of this hypothesis, we 
have carried out an experiment which 
showed that training of vocal match- 
ing responses to stimuli with a com- 
plex spectrum (but with a pattern 
in their harmonics) significantly in- 
creased the ability to hear a low pitch 
in these stimuli. We have used as a 
criterion for hearing a low pitch the 
matching of the perceived pitch to 
that of a pure tone of low frequency 
(not included in the complex spec- 
trum). The matching of a low pure 
tone to the fundamental frequency of 
a mediating hum response undoubt- 
edly depends on previous learning 
also. 

Just as the subjects with vocal 
training can match a complex stimu- 
lus, with a pattern of harmonics simi- 
lar to that of their voice, by humming 
or singing, so they can match with 
their voice a stimulus which has a 
perceived intermittency similar to 
their voice. Thus one can account 
for the fact that many subjects with 
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vocal training can hear a low pitch in 
certain high frequency, intermittent 
stimuli. Here again we have used 
matching of pitch to a low pure (non- 
intermittent) tone as a criterion of 
hearing a low pitch. While not the 
only criterion that might be em- 
ployed, it has the advantage of 
distinguishing pitch as measured in 
this way from perception simply of 
intermittency. 

Further experiments with trains of 
pulses led us to the conclusion that 
there is no precise time analyzing 
system present for determining low 
pitch. 
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DNA AND RNA AS MEMORY MOLECULES! 


JOHN GAITO 


Kansas State University 


The evidence relative to the involvement of DNA and RNA in memory 

is considered. Because of its marked stability DNA is not too likely 

a candidate as a memory molecule; however, it should not be excluded 
completely in that some evidence suggests less stability in neural DNA * 
than in nonneural DNA. Evidence concerning RNA is more favorable 
but is subject to multiple interpretations. The great lability of RNA 
argues against it being the memory mechanism. 6 or more RNA frac- 
tions complicate the problem. Thus, there is no conclusive evidence 
to indicate that either of the nucleic acids is the memory molecule. 


As long as Man has been capable of 
speculating about himself and the Uni- 
verse, probably one of the most per- 
plexing problems which has plagued him 
has been that of explaining the mecha- 
nism whereby the physical energies of 
the external world are transformed by the 
organism into representative processes to 
symbolize experiential events during 
learning. This problem has confounded 
philosophers and scientists for many 
centuries (e.g., see Boring, 1950). Even 
though some progress has been attained, 
many answers still elude the scientist. 

Biological approaches which have been 
concerned with the problem of learning 
and memory have been either of a 
neurological or biochemical nature. Ex- 
amples of the neurological approach in- 
clude Lashley (1929), Kohler (1940), and 
Hebb (1949). The biochemical treat- 
ment is illustrated by Rosenzweig, 
Krech, and Bennett (1960) and Overton 
(1959). 

This paper will be concerned with 
another biochemical approach, but at a 
more molecular level than that of 
Rosenzweig et al. and Overton. Such 
an approach is not a novel one. A num- 


1 Throughout this paper the following ab- 
breviations will be used: DNA (deoxyribo- 
nucleic acid), RNA (ribonucleic acid), A 
(adenine), T (thymine), G (guanine), G 
(cytosine), U (uracil), and RNase (ribo- 
nuclease). 


ber of people have expressed the idea 
that memory involves a molecular change 
in certain tissue, e.g., Halstead (1951), 
Katz and Halstead (1950), Pauling and 
Weiss during the Hixon Symposium 
(Jeffress, 1951), Gerard (1960), and 
Schmitt (1962). In a recent book on the 
nature of chemical bonding, Pauling 
(1960) concludes with, 


I believe that thinking, both conscious and 
unconscious, and short term memory involve 
electromagnetic phenomena in the brain, 
interacting with the molecular (material) 
patterns of long-term memory, obtained from 
inheritance or experience . . . [p. 570]. 


In a previous paper (Gaito, 1961), a 
possible mechanism for the memory func- 
tion resulting from learning was sug- 
gested. This involved a change at one or 
more loci in DNA, RNA, or amino acid 
molecules. These ideas were wholly 
speculative when first presented. How- 
ever, some exciting research has been 
conducted in the last few years which is 
pertinent to the possible involvement of 
DNA or RNA in memory functions. 
We would like to consider these data. 


INTERRELATIONSHIP OF DNA, RNA, 
AND AMINO ACIDS 


The interaction of DNA, RNA, and 
amino acids in protein synthesis has been 
described frequently (Hurwitz & Furth, 
1962; Ochoa, 1962; Rich, 1962). The 
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basic information (genetic code) in DNA 
is transmitted to RNA in the nucleus. 
The two stranded DNA molecule di- 
vides; one of these strands then forms 
a hybrid two stranded molecule with 
messenger RNA which forms as a 
complement of DNA. Thus if the one 
strand of DNA has the linear sequence 
ATTGC ..., messenger RNA would 
consist of UAACG. ... 

Messenger RNA supervises the join- 
ing of amino acids to form proteins in or 
on the ribosomes of the cytoplasm. The 
transfer of RNA from nucleus to cyto- 
plasm can be demonstrated by radio- 
graphic techniques. In the synthesis of 
protein, an RNA of 50 to 100 base units 
(soluble or transfer RNA) gathers an 
amino acid and attaches itself to its 
appropriate site on messenger RNA. 
There are supposed to be different solu- 
ble RNA's for each amino acid. Each 
soluble RNA terminates in the sequence 
CCA at one end. Thus many soluble 
RNA's with their associated amino acids 
line up on messenger RNA and the 
amino acids become attached to form the 
specified protein. 

A coding procedure of a minimum of 
three nucleotides for each amino acid 
has been considered to be most plausible 
(Crick, 1962; Crick, Barnett, Brenner, & 
Watts-Tobin, 1961; Rich, 1962). Crick 
(1962) suggested that: 


The message is read in nonoverlapping groups 
from a fixed point . . . . in groups of a fixed 
size that is probably three, although multiples 
of three are not completely ruled out... . 
There is very little nonsense in the code. 
Most triplets appear to allow the gene to 
function and therefore probably represent an 
amino acid. Thus in general more than one 
triplet will stand for each amino acid [p. 74]. 


Matthaei, Jones, Martin, & Nirenberg 
(1962) have developed tentative codes 
for most of the 20 amino acids by using 
synthetic polyribonucleotides to syn- 
thesize protein. For example, UUU rep- 
resents phenylalanine; two C’s and one 
U, proline; etc. The code is degenerate, 
i.e, more than one triplet can code a 
single amino acid. For example, two 
U's and a C or a Gis considered the code 
for leucine. 


Even though the trinucleotide coding 
is most accepted, other coding procedures 
have been suggested. Roberts (1962) in- 
dicated that a doublet code would elimi- 
nate degeneracy aspects. In this code a 
G with a C represents alanine; G and U, 
valine; etc. However, this coding pro- 
cedure results in ambiguities, e.g., AA 
codes lysine and methionine. 

An interesting “Book Model” of cod- 
ing and genetic information transfer has 
been proposed by Platt (1962). He uses 
the analogy of a complex instruction 
manual in which “information” is linearly 
arranged in “words” that are “read out” 
sequentially in time. He relates in a 
clever fashion the various aspects of 
books and printing procedures to the 
DNA, RNA, and amino acid interaction. 


DNA As A POTENTIAL Memory 
MOLECULE 


In that it has been firmly established 
that the linear sequence of bases in DNA 
coastitutes the genetic code, it is reason- 
able to expect that the linear sequence in 
neural DNA or RNA should provide an 
experiential or memory code. A main re- 
quirement of this expectation is that the 
nucleic acids need be labile because it is 
necessary for the molecule to be capable 
of being changed. However, the mole- 
cule should not be overly labile or 
memory would be changing drastically 
and make for chaotic behavior. 

Of the two nucleic acids, DNA is the 
least labile. The average DNA cotnent 
appears to be relatively stable even dur- 
ing marked physiological alterations of 
cells (Alfert, 1957). White, Handler, 
Smith, and Stetten (1959) ,after review- 
ing the work of a number of investiga- 
tors, concluded that DNA was formed 
to an appreciable extent only during 
active mitosis by a cell. 

Base analogues have been used fre- 
quently in the DNA of bacteriophages to 
produce base changes (Sinsheimer, 1960). 
For example, 5-bromouracil and 2-amino- 
purine are presumed to act to bring about 
the replacement of the A-T pairs by G-C 
ones, and vice versa. These agents also 
bring about a reversion of these changes. 
Freese (1961) has indicated that low 
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pH, ethyl ethane sulfonate, and other 
agents, will cause transitions from one 
pair to the other. 

Benzer (1961) has made a detailed 
examination of a small portion of the 
genetic map of bacteriophage T4, a por- 
tion which controls the ability of the 
phage to grow in E. coli. He indicated 
that A-T pairs are held much less strongly 
than are G-C pairs which suggests that 
in mutation the A-T pairs (“hot spots”) 
will be more subject to substitution. 
There are two hydrogen bonds for A-T 
pairs but three for G-C. He stated that 
A-T pairing would change to A-G and 
thence to G-C. 

Thus DNA is a stable molecule, altered 
only by mutagenic agents. This fact 
appears to obviate its acting as a memory 
mechanism; however, we should not dis- 
count it completely for several reasons. 
First, most of the DNA which has been 
investigated has been from nonneural 
tissue. Itis possible that the stability of 
DNA in nerve cells is different than it is 
in other cells and neural DNA may be 
more conducive to modification. 

Some attention has been devoted to the 
gross DNA content of neural tissue. 
One study, by Mandel, Harth, and 
Borkowski (1961), has indicated that the 
highest DNA content was in the grey and 
white matter of the cerebellum, and in 
the olfactory bulb. The lowest amounts 
were found in the spinal bulb, the mesen- 
cephalon, and the thalamus. Moderate 
amounts were in the white and grey 
matter of the cerebrum, the hypothala- 
mus, the hippocampus, and the corpus 
striatum. The amount of DNA indicates 
the richness of the different areas in 
nuclei. Vladimirov, Barnov, Pevzner, 
and Tsyn-Yan (1961), reported that the 
amount of DNA was the same in the 
motor, visual, and auditory areas in 
Layer 2 of cat cortex. Under hypoxic 
conditions a significant decrease occurred 
in the motor and visual areas but not in 
the auditory area. 

Other research indicates a moderate 
degree of lability in neural DNA. 
Koenig (1958), using radioactivity trac- 
ing techniques, obtained results which 
suggested greater DNA lability in the 
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central nervous system tissue and related 
tissue than has generally been assumed. 
The results indicated a slow, but definite, 
turnover of DNA in nondividing cells 
of the brain and the walls of cerebrospinal 
blood vessels. 

Another aspect which obviates exclud- 
ing DNA as a potential memory mecha- 
nism is that several types of DNA have 
been reported. Swift (1962) concluded 
from his studies on DNA in species of 
flies that there are two types of DNA: 
one which is constant from cell to cell and 
another varying in particular cell types 
at particular stages of ontogeny. Like- 
wise, Bendich, Russell, and Brown (1953) 
found two types of DNA in growing rat 
tissue. Thus, the varying DNA might 
be involved in memory functions. 

In any event, the stability of the DNA 
molecule does not necessarily preclude it 
operating as a source for memory 
changes. In fact its stability would 
appear to be an argument in its favor. 
The experiential code requires great 
stability. If memory were maintained 
by molecules which were too labile, it 
would change rapidly causing chaotic 
behavior. 


RNA As A POTENTIAL 
Memory MOLECULE 


Information concerning the possibility 
of RNA playing a role in memory appears 
to be more encouraging to some in- 
dividuals because of its great lability. 
RNA varies from cell to cell and is very 
active metabolically (Ris, 1957). Sins- 
heimer (1960) reported that the amount 
of RNA in the salivary gland of Droso- 
philia drops rapidly during the early 
stage of differentiation whereas the 
amount of DNA increases, and changes 
in the overall nucleotide composition of 
RNA of Chlorella cells occur during 
starvation. LeBaron (1959) indicated 
that other cytochemical research provides 
evidence for increased activity of cellular 
RNA and proteins. He concluded that 
“There is certainly ample evidence for 
the active turnover of various lipide, 
protein and nucleic acid structural con- 
stituents, and the possibility exists that 
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there is some alteration of this turnover 
on stimulation [p. 597].” 

Nitrous acid has been used as a 
mutagen with tobacco mosaic virus RNA 
to bring about base changes. This 
reagent substitutes hydroxyl groups (OH) 
for amino groups (NH2). Strauss (1960) 
has indicated that nitrous acid reacts 
with nucleic acids containing adenine, 
guanine, and cytosine and converts them 
to the corresponding hydroxyl com- 
pounds, hypoxanthine, xanthine, and 
uracil. These results imply that the re- 
action of nitrous acid with nucleic acids 
produces base analogues which result in 
mutation upon the duplication of genetic 
material. Tsugita and Fraenkel-Conrat 
(1960) have shown that nitrous acid 
alters the composition of RNA of tobacco 
mosaic virus and that the resulting pro- 
tein of the mutant differed from the 
parent strain with three amino acids 
being replaced by three others (proline, 
aspartic acid, and threonine by leucine, 
alanine, and serine). 

The RNA content of brain tissue has 
been studied by Mandel et al., (1961). 
They reported that the highest RNA con- 
tent was in the olfactory bulb, the grey 
matter of the brain cortex, and the cere- 
bellum, the hypothalamus, and- the 
hippocampus. Lower amounts were 
found in the corpus striatum, the thala- 
mus, and the white matter of the 
cerebrum and the cerebellum. The low- 
est figures occurred in the mesencephalon 
and the spinal bulb. The greatest RNA 
turnover was found in the olfactory bulb, 
the hypothalamus, and the grey matter 
of the brain cortex and the hippocampus. 

More pertinent to the relationship be- 
tween RNA and memory is the work of 
Hydén and a number of other investiga- 
tors. Hydén (1959) has demonstrated 
that RNA is produced in the nerve cells 
at a rate which follows neuronal activity. 
He believed that the nerve cell fulfills its 
function under a steady and rapidly 
changing production of proteins, with the 


?In some organisms, e.g., tobacco mosaic 
virus, polio virus, and influenza virus, no 
DNA is present. In this case RNA appears 
to be the hereditary material. 
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RNA as an activator and governing 
molecule. He hypothesized that memory 
involves a change in the sequence of bases 
inthe RNA molecule; this change occurs 
when one or more bases are exchanged 
with the surrounding cytoplasmic ma- 
terials. Hydén reported that individuals 
with certain psychic disorders have 
smaller amounts of RNA and proteins in 
ganglion cells of the central nervous sys- 
tem than do normal individuals (cited by 
Davidson, 1960). Administration of 
malononitrile to these individuals in- 
creased the content of these substances. 
Egyhazi and Hydén (1961) indicated 
that the malononitrile action was due to 
the formation of a dimer of malononitrile, 
tricyano-aminopropene. They reported 
that small amounts of this latter com- 
pound caused a remarkable increase in 
the amounts of proteins and RNA in the 
cell and modified the base composition of 
the RNA with guanine increasing by al- 
most 300%. 

Hydén (1961) reported that the RNA 
content of the nerve cell ranks with the 
highest of all cells in the body. He 
showed that in man the RNA content of 
the motor nerve cells in the spinal cord 
increases from the third year of life to 
age 40, remains constant to about 60, 
and then declines rapidly thereafter. He 
found that if an animal is deprived of 
stimulation in one of the sensory sys- 
tems, e.g., in vision or hearing, the neu- 
rons in that system do not develop bio- 
chemically. The structure appeared 
normal but the nerve cell was im- 
poverished in both RNA and proteins. 

Riesen (1958), in discussing his work on 
light deprivation, referred to Brattgard’s 
findings that the content of RNA and 
proteins decreases in retinal ganglion 
cells with prolonged light deprivation. 
He thought that RNA and protein were 
so highly susceptible to recent prior 
stimulation as to obviate their being con- 
sidered as a mechanism for durable 
memory. Instead he reasoned that they 
might be important for immediate mem- 
ory. Pertinent to this thought is the 
work of Geiger, Yamasoki, and Lyons 
(1956). These individuals stimulated 
the brain cortex of cats and found a 
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change of RNA in the stimulated areas, 
which was reversible in minutes. 

Likewise, Morrell (1961) has shown 
gross changes in RNA of nerve cells in 
a cellular learning-like situation. He 
stimulated a portion of the cortex with 
aluminacream. Thecorresponding tissue 
of the opposite hemisphere showed ac- 
tivity during this stimulation. At first 
the activity in the opposite hemisphere 
appeared only when the stimulated cor- 
tex was active. Soon the tissue in the 
nonstimulated area showed excitation 
spontaneously even when it was isolated 
from the stimulated hemisphere by cut- 
ting its connections. Biochemical analy- 
sis of the neurons in the isolated tissue 
showed a change in the RNA content. 

The above results have been per- 
tinent to sensory stimulation experi- 
ments. However, several experiments 
have been concerned with memory func- 
tions. Indirect support for the RNA 
modification hypothesis is provided by 
the results of Cameron and Solyom 
(1961) and Cameron, Solyom, Sved, and 
Wainrib (1962)%. They found that 
administration of RNA (but not DNA) 
to individuals with presenile, arterio- 
sclerotic, and senile syndromes (with 
some degree of memory impairment) 
brought about memory improvement. 
These changes involved almost total 
retention in some cases. When the RNA 
was discontinued later, memory relapses 
occurred. One might possibly explain 
these results as not involving memory 
per se but as due to the supplying of nu- 
tritional material which has decreased in 
amount with increased age (see Hydén, 
1959). 

Kreps has also reported an altered 
RNA turnover on conditioning (cited by 
Gerard, 1960). 

An interesting experiment by Corning 
and John (1961) is also pertinent. Using 
a classical conditioning procedure, pair- 


3 Unpublished manuscript, 1962, entitled 
“Effects of Intravenous Administration of 
Ribonucleic Acid upon Failure of Memory for 
Recent Events in Presenile and Aged Indi- 
viduals” by D. E. Cameron, L. Solyom, 
S. Sved, and B. Wainrib. Obtainable at 
Royal Victoria Hospital, Montreal, Canada. 
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ing light with shock, they conditioned a 
number of flatworms and then transected 
them into head and tail sections. Pre- 
vious experimentation had indicated that 
heads would regenerate new tails, tails 
would regenerate new heads, and both 
would retain some “memory” of the 
avoidance situation. Corning and John 
thought that RNA might play a role in 
the transmission of an acquired structural 
configuration from the trained to the re- 
generating tissues. Thus they reasoned 
that if the trained portions were regener- 
ated in the presence of RNase, the en- 
zyme would affect the altered RNA struc- 
ture, producing some animals with anaive 
head (regenerated portion) and trained 
tails and others with trained heads and 
naive tails (regenerate). They stated 
that the head region would probably be 
dominant and thus the trained head 
animals should show more retention. 
Their results indicated that heads re- 
generated in RNase retained the memory 
as well as did head and tail sections re- 
generating in pond water. However, the 
tails regenerating in RNase performed 
randomly. The authors suggested that 
the RNase did not affect intact tissue 
but did interfere with regenerating tissue 
and maintained that the results are com- 
patible with the assumption that RNA is 
a memory mechanism. 

The idea that changes in the linear se- 
quence of bases in RNA constitutes the 
experiential code has been offered in- 
dependently by a number of individuals. 
However, Dingman and Sporn (1961) 
suggested that changes in the helical 
structure and overall configuration, as 
well as sequence changes, could be the 
basis for memory. They performed two 
experiments with radioactive 8-azagua- 
nine injections in rats which were per- 
tinent to the RNA hypothesis. 8-azagua- 
nine was used as an inhibitor of RNA be- 
cause this base analogue had been shown 
to be an inhibitor of enzyme synthesis in 
bacteria. In both experiments paper 
chromatographic procedures indicated 
that the base analogue had been incor- 
porated into the RNA of the brain. In 
neither experiment was there any signifi- 
cant difference between experimental and 
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control animals in average time to run 
the maze, suggesting that 8-azaguanine 
had no adverse effect on the motor ability 
of the animals. However, in one experi- 
ment the experimental animals had a 
significantly greater mean number of 
errors than did the controls on all 15 trials 
in the learning of a maze. In another ex- 
periment concerned with retention of a 
maze pattern (tested by a single trial 
after learning a maze), experimental ani- 
mals did not differ significantly from con- 
trol animals. The experimental animals 
had a greater mean number of errors than 
did the control animals. There were only 
eight animals used in each group (as com- 
pared with 14 and 15 in the learning ex- 
periment); thus it is possible that if n 
had been larger in the retention experi- 
ment, the results would have indicated 
that 8-azaguanine adversely affects both 
learning and retention of maze patterns 
in rats. 

Based on their results Dingman and 
Sporn maintained that RNA may be 
directly involved in learning but not in 
retention. However, they admitted that 
their results did not necessarily indicate 
that RNA metabolism was intimately 
linked with the formation of memory 
traces in the brain because 8-azaguanine 
might have interfered with metabolic 
processes which affected RNA indirectly. 

Most of the above studies in which 
RNA changes were reported do not ex- 
clude the possibility that the basic 
changes were effected in DNA which then 
brought about changes in RNA. How- 
ever, the results of Cameron, et al.4 and 
of Corning and John (1961) appear to 
argue against the involvement of DNA in 
memory. Yet, as indicated above, the 
Cameron work may be more relevant to 
nutritional needs than to memory. The 
ingested RNA would be degraded into 
the constituents (bases, sugars, and 
phosphates) by enzymatic action be- 
fore incorporation into individual cells, 
These portions might increase the pool 
from which RNA constituents are drawn 
and, thereby, tend to improve the overall 
condition of older individuals who have 


4 See footnote 3. 


less RNA available (Hydén, 1961). In- 
formation relative to this possibility 
might be obtained by using both young 
and aged subjects (some having memory 
deficits whereas others would have no 
memory impairment) and experimental 
tasks varying in the degree of memory 
involvement. 

Furthermore, there are possible meth- 
odological deficiencies in these studies. 
In the Cameron studies there is no indi- 
cation that measures were taken to pre- 
vent bias from affecting the improvement 
ratings of each of the subjects in all the 
experiments. The ratings were based on 
performance on several tasks, supple- 
mented by information supplied by the 
patient and his relatives. As is well 
known, results of drug studies such as 
this can be greatly affected by the atti- 
tudes and expectations of the patient and 
hospital personnel if double blind pro- 
cedures are not employed. However, in 
one of the two experiments reported by 
Cameron and Solyom (1961), double 
blind procedures were used and some im- 
provement reported. On one objective 
test in this experiment, the Wechsler 
Memory Test, scores of the Placebo and 
RNA groups did not differ significantly. 

In the Corning and John experiment a 
qualitative analysis of the data suggests 
that RNase may have sensitized the 
planaria such that one would expect that 
head animals reared in RNase would 
show a greater number of responses to 
light than would tail animals reared in 
this enzyme. One would expect such re- 
sults because the head region contains 
the light receptors and the head portion 
of the head animals would be in the 
RNase for the total regeneration period 
whereas the head region of the tail 
animals would be exposed to the influence 
of the RNase only during the latter por- 
tion of the period. 

Thus, the hypothesis that alterations 
occur in RNA molecules during learning 
situations may appear to have a brighter 
future than the same conjecture relative 
to the DNA molecule. However, even 
though the experimental results tend not 
to negate the involvement of RNA in 
memory, its extreme metabolic lability 
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raises some doubt as to how stable mem- 
ory can be handled by an overly reactive 
molecule. Thus Riesen's suggestion that 
RNA subserves an immediate memory 
function may be entirely appropriate and 
require that another molecular mecha- 
nism be postulated for maintaining 
permanent memory. 

An important problem arises in regard 
to the above studies on RNA. These 
studies have been concerned with total 
RNA. Such RNA is a combination of a 
number of RNA's in the cell. There is a 
chromosomal RNA, two RNA fractions 
in the nucleolus (one of which is messenger 
RNA), nucleoplasmic RNA, and soluble 
and ribosomal RNA in the cytoplasm. 
Thus there are at least six RNA fractions, 
one of which may be a memory molecule. 
Assuming that one is important in mem- 
ory functions, information relative to 
total RNA is worthless in that it con- 
founds irrelevant and relevant RNA. 
The important question is, “Which RNA 
is memory RNA?” Messenger RNA 
might appear to be a suitable candidate 
for this role; however, some individuals 
believe that messenger RNA is too labile. 
In one of his recent publications, Hydén 
has suggested chromosomal RNA (Hydén 
& Egyhazi, 1962). 

Investigators are just beginning to con- 
sider the different RNA fractions. The 
content of cytoplasmic RNA of Layer 2 
in cat cortex was higher in the motor 
area than in the visual or auditory cor- 
tices (Vladimirov, et al., 1961). The 
amount in the auditory area was the low- 
est of the three. Under hypoxia signifi- 
cant decreases occurred in the motor and 
visual areas. 

Hydén and Egyhazi (1962) exposed 
young rats to a situation in which they 
had to learn to balance on a wire to reach 
a platform where food was located. 
They found that the cytoplasmic RNA of 
Deiters cells from the vestibular nucleus 
did not differ from that of controls un- 
stimulated or from functional controls 
who were rotated to produce vestibular 
stimulation. In nuclear RNA, there were 
significant differences in base ratios. In 
the experimental group there was a 
greater amount of A and lesser amounts 
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of U than in the other two groups. The 
authors maintained that the results in- 
dicated the change of RNA bases during 
learning. However, the base changes ap- 
pear to be related to sensory stimulation 
rather than to learning. The sensory- 
motor activity of balancing on the wire 
to reach the platform provides stimula- 
tion for the cells in the vestibular system 
in the medulla but one would expect that 
any changes representing the learning 
aspects of the activity would be found at 
a higher level in the brain. 

This paper has been concerned with 
the site and mechanism of the memory 
trace. The problem of reactivation of 
this molecular trace is an important re- 
lated point but is a real mystery (Schmitt, 
1962). Assuming that one of the nucleic 
acids is a memory molecule, there is 
some suggestion that a protein might 
function as a regulator molecule, making 
available or unavailable the memory 
code. Huang and Bonner (1962) found 
with pea embryo chromatin that when 
the histone fraction of DNA was re- 
moved, the rate of RNA synthesis in- 
creased fivefold. Huang and Bonner 
concluded that the function of histone 
was to bind DNA and block the transfer 
of information from DNA. Such regu- 
latory action might be relevant to the 
reactivation problem. 

Leslie (1961) has considered histone as 
having other functions. He suggested 
that histone stabilizes RNA so as to pre- 
vent any modification of the latter; his- 
tones separated from their RNA site were 
presumed capable of transforming other 
unprotected RNA molecules of different 
base compositions. 

These results may be relevant to mem- 
ory formation also. Even though there 
is a tendency to relate memory to base 
changes in DNA or RNA, it is possible 
that the nucleic acids are not altered dur- 
ing memory events. One might suggest 
that during stimulation the configuration 
of the histones is modified so as to make 
available the potential inherent in the 
nucleic acids. This modification would 
alter the nucleic acid-histone complex 
and would represent the symbolic rep- 
resentation of the experimental event. 
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CONCLUSIONS 


Based on the above discussion, what 
can one conclude? There are three points 
we believe are suggested. 

1. There is no conclusive direct evidence 
to indicate that either of the nucleic 
acids is the memory molecule. Such 
suggestions have been of an inference 
nature. However, in that DNA provides 
a genetic code via the linear sequence of 
bases, it is plausible to expect that DNA 
or RNA provides an experiential code in 
the same way. The results on sensory 
stimulation and memory are consistent 

with this expectation for RNA but do not 
directly show the involvement of the 
nucleic acids in memory events. 

2. The indirect evidence for RNA as 
the experiential code is no stronger than 
it is for DNA. More attention has been 
devoted to RNA than to DNA. This is 
probably due to the great stability of 
DNA and to the fact that it is the genetic 
code. Thus DNA should not be excluded 
as the memory molecule even though 
investigators prefer RNA. 

3. If RNA is the memory molecule, 
which RNA is it? Furthermore, what 
is the exact mechanism? The linear se- 
quence of purines and pyrimidines ap- 
pears to be the most plausible mecha- 
nism in that these sequences provide the 
basis for genetic coding in DNA. How- 
ever, the suggestions of Dingman and 
Sporn (1961) that the helical structure 
and overall configuration are other pos- 
sible mechanisms should be seriously 
considered. 

Thus, there is definite evidence to in- 
dicate that some gross changes in DNA 
and RNA can occur during stimulation. 
However, no one has directly detected 
a change of submolecular structure in 
either DNA or RNA such as discussed 
above; all the evidence for changes are 
of indirect nature. The hypothesizing 
about nucleic acids still remains in the 
realm of speculation. The validity of 
these hypotheses must await further re- 
search by biochemists, psychologists, 
neurophysiologists, and others of related 

areas. 

DNA is more homogeneous than is 
RNA and should be easier to evaluate. 


THEORETICAL NOTE 


We believe that a plan of research should 
begin with an evaluation of DNA. Such 
a program of research is under way at 
Kansas State University using the facili- 
ties of the Psychology Department and 
the Bacteriology Department. C. W. 
Dingman’ of the National Institutes of 
Health is also concerned with the role of 
DNA (and protein complex) in memory. 
A number of other psychologists and bio- 
logical science teams are attacking the 
problem of the role of RNA. Exciting 
research results such as presented by 
molecular biologists in the last decade 
should be provided during the next 
decade by psychologists and other bio- 
logical scientists. 
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GENERALITY OF HONESTY RECONSIDERED 


ROGER V. BURTON! 
National Institutes of Health, Bethesda, Maryland 


The conclusion by Hartshorne and May (1928) regarding the specificity 
of moral behavior has been reconsidered in light of reanalyses of their 
data using factor analysis and Guttman’s (1955) simplex model. Other 
studies relevant to the issue of specificity versus generality of moral 
behavior were also considered. The evidence indicates there is some 
underlying generality in moral behavior although there is still much of 
the variance of the honesty tests due to specific test determinants. A 
model is proposed to account for the findings. This model involves 2 
generalization gradients: a gradient involving just the stimulus elements 
in a particular situation, and a gradient pertaining to verbal mediation 
in which certain cognitive elements are abstracted from one situation 
and generalized to a different and perhaps entirely new setting. At- 
tention was given to the organizing and heuristic value of the model. 


NOVEMBER 1963 


In the 1920s there was a substantial 
amount of research dealing with the 
complex domain of honesty, or moral 
behavior. This research reached a 
culmination in the classic studies of 
Hartshorne and May (1928) and 
their collaborators (Hartshorne, May, 
& Maller, 1929; Hartshorne, May, 
& Shuttleworth, 1930). After their 
efforts, investigation into this area 
of human behavior concentrated on 
exploring the cognitive structure and 
development of morality, while studies 
of overt choice behavior in test situa- 
tions declined until relatively recently. 
The loss of enthusiasm for this area 
of research may have been due to the 
very thorough, excellent job done by 
Hartshorne and May. Another rea- 
son for the turning away from honesty 
as a subject for study may have been 

1] am especially indebted to John D. Camp- 
bell and to two anonymous reviewers for their 
constructive comments on this paper. 
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the conclusion from their study shal 


conflict between honest or deceitful 
behavior is quite specific to each 
situation, that one could not general- 
ize about a subject’s honesty from a 
few samples of his behavior. _ 

In the 1950s there was a renewed 
interest in the area of morality, es- 
pecially in the developmental aspects 
of such behavior. Those working in 
this area must take into account the 
generality of their findings, especially 
when they utilize only a single be- 
havioral test of honesty. It is the 
purpose of this paper to reconsider 
the specificity conclusion by looking 
again at the Hartshorne and May 
data and other evidence relevant to 
this question. 


Generality versus Specificity 


The two extreme points of view 
about honest behavior can be quickly 
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sketched. The unidimensional ap- 
proach holds that a person is, or 
strongly tends to be, consistent in his 
behavior over many different kinds of 
situations. Thus a person who lies 
in one situation is not only likely to 
lie in other situations, but is also 
highly likely to cheat, steal, not feel 
guilty, and so on. This conception 
of the generality of character has 
been more fully presented by Mac- 
Kinnon (1938) in a study supplying 
empirical support for such an inter- 
pretation. The graduate, student 
subjects in his study who cheated on a 
problem solving task by copying also 
tended to lie about their behavior, 
to report that they rarely felt guilty, 
and to perceive the task as being un- 
fair. The students who did not cheat 
reported that they often felt guilty 
even when they were not aware of 
having transgressed, and that they 
perceived themselves as inadequate 
to solve the task rather than that the 
task was unfair. The conclusion is 
that these findings demonstrate a 
consistency in personality, and that 
one is justified in drawing conclusions 
from relationships between one sam- 
ple of honest behavior and other 
measures relevant to one’s investiga- 
tion. MacKinnon recognized that 
he did not test over different kinds of 
situations, but he argued that the 
consistency he found was sufficient to 
support his interpretation. In gen- 
eral, this interpretation is consonant 
with most psychoanalytic formula- 
tions of superego behavior based on 
an identification hypothesis. As Mac- 
coby (1959) has pointed out, theo- 
retical formulations stemming from 
Piaget’s (1932) schema also conform 
to this “unitary process” conceptual- 
ization of morality. 

The doctrine of specificity of moral 
behavior holds that a person acts in 
each situation according to the way 
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he has been taught to act under these 
particular conditions. The predicta- 
bility of one’s moral behavior from 
one situation to another depends on 
the number of identical elements 
which the two settings share. This 
formulation does not accept the ab- 
stract concept of “honesty” as a 
valid character trait, but instead 
argues that there are many different 
kinds of specific behaviors which 
tend to be independent even though 
they may be included under the same 
rubric. Therefore, knowing that a 
person has cheated in a final examina- 
tion in no way permits one to predict 
what the same person would do if 
tempted to cheat in a different setting 
such as a competitive game or busi- 
ness venture. Furthermore, there is 
little if any association between the 
extent to which a person will experi- 
ence anxiety following a deviation in 
one moral area with the intensity of 
guilt following deviation in a different 
area. The study reported by Allin- 
smith (1960) reflects to some extent 
this interpretation. Utilizing a story 
completion method for his measures 
of moral behavior, he found there 
was little consistency in the intensity 
of guilt expressed in junior high school 
students over different transgressions. 
He also found a noticeable lack of 
correlation between the measures of 
guilt and the measures of resistance 
to temptation. He concludes, there- 
fore, that there are specific “guilts’’ 
which tend to be unrelated to resist- 
ance to temptation rather than a 
unified character trait representing 
an individual’s morality. 

The reader may note that the two 
studies cited have measured both the 
tendency to deviate in a temptation 
situation and the reaction to having 
already deviated. This simultaneous 
consideration of resistance to tempta- 
tion and of guilt is customary in 
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studies addressed to the development 
of moral behavior, of the superego, 
or conscience. However, there seems 
good reason, both theoretically and 
empirically (Burton, 1959; Burton, 
Maccoby, & Allinsmith, 1961), to 
consider these aspects of morality 
separately and then to investigate the 
extent of the correspondence between 
them. This paper is addressed pri- 
marily to a consideration of the 
generality of resistance to temptation 
as measured by behavior in lifelike 
temptation settings, and also by ques- 
tionnaires and projective techniques. 


Studies in Deceit 


Hartshorne and May’s Studies in 
Deceit (1928) is undoubtedly the most 
comprehensive and well-known study 
of temptation and cheating behavior. 
One of the most important conclusions 
from this study was that there was no 
general trait of honesty. Consistency 
of behavior from one situation to 
another was due to similarities in the 
situations and not to a consistent 
personality trait in people. However, 
these authors did recognize that there 
seemed to be some similar overlapping 
elements in all the test situations: 


It may be contended of course that as a 
matter of fact we rarely reach a zero correla- 
tion, no matter how different may be our 
techniques, and that this implies some such 
common factor in the individual as might 
properly be called a trait. We would not 
wish to quarrel over the use of a term and are 
quite ready to recognize the existence of some 
common factors which tend to make indi- 
viduals differ from one another on any one 
test or on any group of tests, Our contention, 
however, is that this common factor is not an 
inner entity operating independently of the 
situations in which the individuals are placed 
but is a function of the situation in the sense 
that an individual behaves similarly in dif- 
ferent situations in proportion as these situa- 
tions are alike, have been experienced as 
common occasions for honest or dishonest 
behavior, and are comprehended as oppor- 
tunities for deception or honesty [p. 385]. 


The emphasis, then, was on the 
specificity of each test situation which 
involved different motives, different 
values in conflict, and—most im- 
portantly—different learned responses 
for that particular setting. The 
basis for their conclusion was that 
the correlations between the cheating 
tests were too low to produce evidence 
of a unified character trait of honesty 
or deceitfulness. Pi 

Table 1 gives the intercorrelations 
as reported in Studies in Deceit. The 
upper half (summed scores) presents 
the intercorrelations of the types of 
deceptive behavior in which each 
person’s score on a particular kind of 
test is summed with his scores on 
the same kind of test to give a single, 
composite score for that type of 
cheating. The top diagonals are the 
reliabilities for these summed scores. 
The lower half (average correlations) 
of the table gives the average cross- 
correlations between single tests of 
different techniques. The diagonals 
of this half of the table are the average 
correlation of one kind and type of 
test with the other tests of the same 
kind and type. Thus, .871 is the 
reliability of the summed score for 
the three copying tests, .450 is the 
correlation between the summed 
scores for the three copying and the 
six speed tests, .696 represents the 
average correlation for the three copy- 
ing tests, and the average correlation 
among these three copying tests with 
the six speed tests is .292. 

The individual’s score for each 
test contributing to these correlations 
was determined as follows. First, 
a distribution of performance scores 
under carefully supervised (i.e., en- 
forced honesty), conditions was ob- 
tained, and the mean and standard 
deviation of this distribution were 
computed. The scores for some tests 
(e.g., Copying, Speed, and Athletic) 
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TABLE 1 


INTERCORRELATIONS OF HARTSHORNE AND MAY 


A B c D E F G H I 
cora 
A. Copying (3 tests) 696) 450 400 «400 172 288 118 143 350 
+223) 
B. Speed (6 tests) 292 (.440) 374 425 193 345 «169 173 248 
(.721) 
C. Peeping (3 tests) 285 219 (462) 300 234 100 250 -200 108 
Bot 
D. Faking (3 tests) 291 255 196 500) — 300 «122 346 256 
33} 
E. Home (1 test) 154 141 187 = 240) 142 —.015 | —.010 400 
(.772) 

F. Athletic (4 tests) 198 194 062 184 087 (.458) 118 283 230 
G. Parties (3 tests) zx ze = -x - = — 210 | —.004 
H. Stealing (1 test) 127 +128 «160 283 —.010 +162 — — +132 

ea 
I. Lying (1 test) 312 254 «161 -208 400 | —.003 — 132 -836) 


Note.—Hartshorne and May, 1928, Book II, pp. 122, 123, 212. 


The upper half is based on summed scores for each type of test. 


the correlations for each type of test. 


were the differences between perform- 
ance at Time 1 and performance at 
Time 2. For the other tests (e.g., 
Peeping, Puzzle, and Lying), the 
scores were based on a single perform- 
ance. The mean as thus determined 
for each test became the reference 
point for honest behavior, and the 
standard deviation became the unit 
of measurement. When a subject 
was then tested for deception, his 
raw test score was converted into the 
number of standard deviation units 
it was away from this previously 
established mean of honest behavior. 
For example, the mean for the changes 
on the Arithmetic Copying test was 
a gain of 1.06 with a standard devia- 
tion of 3.10. If a person obtained a 
change in score of a loss of 10, his 
converted score would become —3.57, 
which is the score used in computing 
the correlations with the other tests.? 


2 Hartshorne and May also used a “fact” 
score as well as this “amount” score. They 
arbitrarily decided that any score which was 
three or more standard deviations from the 


The lower half was computed by averaging 


Reliabilities for each method of computing these scores are in the diagonal. 


Looking at the intercorrelation 
table, it is seen, as Hartshorne and 
May pointed out, that the sizes of the 
correlations tend to decrease as the 
similarities of the situations decrease. 
This certainly supports their argu- 
ment that there are factors in the 
temptation situation which influence 
the behavior of the child irrespective 
of any proclivity for cheating or 
resistance he brings with him. They 
also had evidence showing that vari- 
ables external to the individual, such 
as ease of cheating, extent of the risk 
involved, and magnitude of the de- 
viation required for success, effect the 
probability that one will cheat and 
also the extent to which one will 
deviate. 

However, it is striking that almost 
all of the correlations are positive and 
that most of the very low correlations 
are contributed by tests with very 


honest mean was labeled a “cheat.” The fact 
score was used in reporting the percentages 
of cheaters on independent variables such as 
age, sex, ethnicity, etc. 
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low or unknown reliabilities. Further- 
more, with the low reliabilities of 
some of the tests, the intercorrelations 
(especially in the upper half) are 
relatively high. Consideration of 
these facts suggest that a re-examina- 
tion of the authors’ rejection of an 
underlying character trait of honesty 
in temptation situations is warranted.* 


Principal Component Analysts 


Hartshorne and May (1928) state 
in a footnote when discussing the data 
on specificity of conduct that “Spear- 
man’s criterion of the presence of a 
common factor was not applied to 
these inter-r’s, as they were not ob- 
tained from the same cases through- 
out [Book II, p. 215].”’ Unfortun- 
ately, they do not report the size of 
the samples for the correlations. But 
from their tables giving the numbers 
of students in each school who were 
given each test, it seems safe to esti- 
mate that the sample sizes ranged 
from at least 75 to over 6,000, with 
most of the reported correlations 
being based on samples between 200 
and 350 in size (Hartshorne & May, 
1928, Book I, pp. 107-108). With 
these samples drawn from similar 
populations without apparent system- 
atic bias, it would seem that the 
correlations reported are fairly good 
approximations to the true population 
values. If this assumption is ac- 
cepted, there is no problem in per- 
forming a principal component analy- 
sis on these matrices. To be more 
confident of the stability of the results, 
only tests with at least .70 relia- 


3J, Merrill Carlsmith and David G. 
Beswich also considered this issue and carried 
out an analysis using Thurstone’s centroid 
method. Their results are essentially the 
same as those reported in this paper. By 
coincidence their analysis was made simul- 
taneously with that reported here. I am 
most appreciative to them for making their 
results available for this paper. 


bilities for the summed scores were 
included. Therefore, only the Copy- 
ing (A), Speed (B), Peeping (C), 
Faking (D), Athletic (F), and Lying 
(I) tests were chosen. 


Descriptions of the Tests 


The copying tests were intelligence 
tests which involved the child's writ- 
ing answers to questions, having the 
papers collected and copies of each 
paper made, returning the papers 
to the pupils, and having them correct 
their own papers. Their cheating 
score consisted of any changes they 
made on their papers. 

The speed tests consisted of simple 
tasks such as number and digit can- 
cellation. Three short forms of these 
tests were given, two under honest 
conditions called “practice” sessions, 
and the last was the test. The pupil 
was permitted to score his own test 
on the last administration. The score 
was the amount of increase on the last 
relative to the second trial. 

Peeping tests involved tracing 
mazes or marking ‘“x’s” in circles 
while keeping one’s eyes closed. De- 
ception was determined by comparing 
the child's performance against a 
norm established under honest con- 
ditions. 

The puzzles used in the faking 
solutions tests were either impossible 
to solve or extremely difficult. To 
achieve a satisfactory solution in the 
time given, a child would have to 
cheat. His deception score consisted 
of how closely he approached a per- 
fect solution. 

The athletic tests were a dyna- 
mometer test for measuring strength 
of hand grip, a spirometer test for 
measuring lung capacity, a chinning 
test, and a standing broad jump. 
Each student was tested privately. 
Three “warm-up” trials were given 
in the presence of the tester who 
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recorded the student's best perform- 
ane. The child was then left alone 
to record his own score on the next 
five trials. Previous standardization 
of performance on the tests permitted 
scoring improbable achievement dur- 
ing the test trials compared with the 
best score on the first three trials. 

The lying test consisted of ques- 
tions about the child’s personal con- 
formity to socially approved morality. 
After failing to standardize the test 
on school classes, the test was stand- 
ardized on a class of graduate students 
who attempted to answer truthfully 
about their own childhood. This 
standard was then used to determine 
the deception score for the school 
pupils.‘ 

Lawley’s (1940) test of significance 
(also see Maxwell, 1961) was applied 
to these matrices and indicated that 
the sample size would have to be 
at least n of 26 for all the matrices 
to be significant at the .001 level. 
Since all these correlations are based 
on at least 200 pupils, these matrices 
are statistically significant and justify 
the extraction of common variance. 

The hypothesis was that a signifi- 
cant amount of variance would be 
extracted by the first component and 
that all the tests would have high 
loadings on this component. The 
standards for determining the sta- 
tistical significance of the factor 
loadings and extracted variance are 
not yet agreed on by statisticians so 
that the decision as to evidence re- 
quired to reject the null hypothesis is 
arbitrary. It seemed reasonable to 
set the criterion at a minimum of 30% 
of the total variance in the matrix 
for the component to have a “g” 

characteristic and for the loadings on 


4A complete description of these tests and 
the procedures for scoring is given in Hart- 
shorne and May (1928), along with the other 
tests which we have not used in our factor 
analysis and therefore have not described. 
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all tests to be a minimum of .40. 
Furthermore, components extracted 
after the first should account for much 
less of the variance relative to the 
first and should tend to be specific 
to individual tests. 


Results 


The results for the principal com- 


ponent analyses of the two matrices. 


are presented in Table 2. The matrix 
based on the average intercorrelations 
(bottom half of Table 1) yields a 
component structure which barely 
meets our arbitrary criterion. As this 
matrix was computed in a conserva- 
tive manner, that is, contains what 
is probably the lowest estimates of 
the true correlations, these results 
can be considered to represent the 
minimum magnitude of common vari- 
ance and loadings for each component. 
The results for the other matrix 
(upper half of Table 1 based on 
summed scores) show that the magni- 
tude of the variance accounted for 
by the first component is larger as 
are the loadings of each test on this 
component. These results are based 
on matrices having unities in the 
diagonal and with the correlations 
being those given in Table 1 whose 
reliabilities were at least .70. These 
correlations were not corrected for 
attenuation. Analyses using relia- 
bilities in the diagonal produce similar 
results and are not reported here. 
The main change is that the amount 
of variance extracted by the first 
component is much greater using the 
reliabilities.® 


5 I have also done analyses of these matrices 
corrected for attenuation. Again the results 
were approximately the same but show a con- 
sistent increase in the magnitudes of the 
loadings and amount of variance extracted 
by the first component as the size of the 
original correlations increased due to their 
respective methods of calculation. The re- 
sults of these analyses even more strongly 
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TABL 


E 2 


PRINCIPAL COMPONENTS 


Based on average intercorrelations 


I | Il m Iv v vI I ul | m | IV v vi 
A. Copying 164 092 | .207 | —.166 | —.116 570 718 | -054 | .013 | —.046 | ~.444 530 
B, Speed 754 106 | —.195 | —.047 614 | — 651 5 | —.134 | —.403 607 151 
C. Peeping „581 .660 | .084 | —.233 | —.239 | —.329 $40 | —.237 | .768 | —.060 | ~,002 | —.244 
D. Faking 703 017 | —.139 677 | —.168 | —.029} .619| .140| .088| .736) .217 | —.028 
F. Athletic 555 | —.$04 | —.537 | —.293 | —.234 | —.095 | 387 825 | O81 | —.177 | —.240 | —.282 
I. Lyin; 6 04 0 023 —.501 | —.474 | —.105 | —.207 | —.394 


+526 | —.Si 638 | —.031 x 
% of Variance [42.8 {16.1 13.4 [10.5 8.9 


—.245 | 561 7 
84 [34.6 jiz.0 140 12:5 fig | 10.0 


These analyses conform in the main 
to our hypothesis and lend support to 
the generality position. In all cases, 
the first component accounts for at 
least twice as much variance as the 
second component. Also, the load- 
ings on this component are all positive 
and exceed .50, with the exception of 
the athletic tests for the matrix of 
averaged intercorrelations. 

There appeared to be more than 
one factor to be extracted, however, 
which would indicate that there may 
be some other common variance. 
The second component rather con- 
sistently accounts for about 17% 
of the variance in the total matrix 
and is related (before rotation) to 
three of the tests. A weak third 


indicate a generality conclusion. For ex- 
ample, the amount of variance in the first 
component for the matrix of average cor- 
relations corrected for attenuation increases 
from 48% to 69%. The second component 
accounted for 19% of the variance and the 
third factor tended to vanish. However, the 
greater magnitude of the first component 
produced by using matrices having reliabili- 
ties in the diagonals and/or correlations cor- 
rected for attenuation may be spurious. It 
seems possible that the additional variance 
extracted by the first of these component 
matrices results from the common “correc- 
tion” for measurement error injected by the 
reliability coefficients. Therefore, only the 
results based on the uncorrected matrices 
with unities in the diagonals are reported as 
these are the most conservative estimates 
of the common variance for these tests and 
the most likely to reject the generality 
hypothesis. 


component, common to the two out- 
of-classroom tests, also was extracted 
and accounted for about 14% of the 
total variance. 

Several criteria were considered in 
deciding when to end the extraction 
of components. First, the percentage 
of variance for each component was 
plotted. The curve flattened out 
after the third component suggesting 
that the analysis should be stopped 
regardless of the size of the sample. 
It is also seen that Components IV, 
V, and VI are specific to single tests. 
Tests of significance of the residual 
matrices (Lawley, 1940; Maxwell, 
1961) indicated that ns of under 200 
would have justified the extraction 
of the second and third components. 
These criteria encouraged the extrac- 
tion of at least two components and 
perhaps three. However, a theo- 
retical limitation on the extraction of 
variance is that the communality 
(h?) for any one test cannot be greater 
than the reliability of that test. 
This theoretical restriction makes 
even the second factor suspect. The 
communality for the third test (Peep- 
ing) exceeds its reliability by a small 
amount. With the extraction of the 
third component, three tests (Peep- 
ing, Athletic, and Lying) exceed their 
reliabilities. 

To help clarify these results, we 
have proceeded to obtain a unique 
solution using Lawley’s method of 
maximum likelihood in order to test 
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TABLE 3 
LAWLEY's SOLUTION FOR A SINGLE FACTOR 


Based on pareria i 
atk summed scores | Semi =e 
Factor loadings | Factor loadings 
A. Copying 697 642 
B. Speed 686 516 
C. Peeping A9T Al2 
D. Faking 607 481 
F. Athletic 443 275 
I. Lying AIS 442 
N > 333* N > 608* 
N > 914" 


N > 500* 


>p = .05, df = 15, 
=$ = 001, df = 15, 


for the sample size required to extract 
more than the first factor. The solu- 
tion is given in Table 3 and shows 
that samples of at least 333 for the 
summed scores and 608 for the aver- 
aged scores would be required to 
reject the null hypothesis of the 
adequacy of a single factor.® 

We see that these analyses are the 
most conservative in testing our 
hypothesis but still produce results 
very similar to those of the principal 
component model. The athletic tests 
contribute very little to this general 
factor for the averaged correlations, 
with 92% specific variance for these 
tests. But the analysis based on the 
summed scores meets our original 
arbitrary criterion for factor loadings 
of at least .40 for all tests. 

If a conservative judgment is made 
from all our analyses, only the first 
component is permitted. Such a 
conclusion would clearly support a 


ë The estimated factor loadings and com- 
munalities for beginning this analysis were 
taken from the principal axis solutions. I 
would like to acknowledge the advice and 
direct assistance of Donald F. Morrison of 
the National Institutes of Health in perform- 
ing these analyses. 
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single factor hypothesis with some 
consideration that the athletic tests 
are independent of the other types 
of tests. For those readers who agree 
with this conclusion, the following 
rotations of components will be super- 
fluous. Since others may feel that 
these considerations are too severe 
with a principal component model 
which achieves a unique solution 
accounting for all the variance in the 
matrix, we have rotated the first 
three factors extracted in our analyses. 


Rotated Factor Structure 


Three factors were orthogonally 
rotated by Kaiser's (1958) analytic 
varimax model. The rotated factors 
indicate that there seems to be a 
difference between those tests ad- 
ministered inside the classroom and 
the athletic tests which are given in 
an out-of-class setting. Factor I’ 
for the summed scores seems to 
clearly indicate a classroom cheating 
factor which involves actual behavior. 
The second factor (II’) is mainly a 
performance cheating factor with the 
main loading from the athletic tests 
and rather substantial loadings from 
the speed and faking tests, all of 
which involve some kind of physical 
performance. The third factor (III’) 
is defined primarily by the question- 
naire test on acceptance of the general 
moral code. The classroom copying 
tests also contribute to this factor. 

The rotated factors for the aver- 
aged cross correlations indicate a 
somewhat similar structure. The 
main difference is in the exchange of 
places by the peeping and lying tests, 
and by the greater degree of specific- 
ity of the second and third factors. 
Factor I’ is again a classroom test 
factor but is defined more by the 
lying test than the actual behavioral 
tests. The second factor (II’) is again 
an athletic dimension. The peeping 


— 
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TABLE 4 
OrtnocoxaL Rotation or Factors 


Summed scores 


Teste = 

! | u» m | 
A. Copying 633 | .223 428 
B. Speed 611 | 479) «120 
C. Peeping 878 | — 085 | —.037 
D. Faking 516 460 187 
F. Athletic | .030 16 | 101 | 
I, Lying 081 AS 959 | 


Averaged cros ~r 


634 600 -205 MS 519 


617 581 288 161 aw 
.780 142 | —.013 957 937 
513 509 M9 -173 Au 
851 090 907 | —.020 831 
939 1 —.068 791 


— 280 


Transformation matrices 


E 
.731 —.535 A24 
-680 514 | —.523 
061 670 740 


* This factor has been reflected, 


tests are specific to the last factor 
(Hr). 

These results are what we might 
have expected from our Lawley solu- 
tions. We see that the athletic tests 
tend to have specific variance in the 
Lawley solution and to define the 
second factor in the rotational anal- 
yses. Also the copying, speed, and 
faking tests tend to be more “general” 
in that their loadings are all positive 
on the rotated factors, contribute to 
more than one of the factors, and have 
the largest communalities in the 
Lawley solutions. It is not clear 
whether the peeping and lying tests 
should be included in the general 
factor of classroom honesty tests or 
should be considered as specific tests. 
From Brogden’s (1940) analysis to 
be discussed below, it seems the 
inclusion of the peeping tests with 
the first factor with the lying factor 
tending to be independent is the 
more stable structure. 


Simple Analysis 


Guttman (1955) has developed and 
alternative model to factor analysis 


y A 
821 355 447 
—.300 935 | —.191 
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as a way of investigating the single- 
common-factor hypothesis,’ He pro- 
poses that if a matrix of intercorrela- 
tions can be ordered in a hierarchical 
gradient conforming to certain 
teria the tests can be considered to 
in the same universe and to 
along a single dimension, an orderin, 
he has called a simplex. 

These criteria are that “the largest 
correlations are all next to the main 
diagonal, and taper off as one goes 
to the upper right and lower left of 
the table,” and that the totals of the 
columns will be curvilinear with the 
lowest totals at the right and left 
extremes and the largest total in the 
middle. Tables 5 and 6 present an 
ordering of the Hartshorne and May 
intercorrelation matrix into quasi-sim- 
plexes. The preponderance of “errors” 
in the ordering are contributed by the 
Home (E), Stealing (H), and Parties 
(G) tests. These tests either have very 
low or unreported reliabilities which 


7I would like to thank Morris Rosenberg 
for bringing this alternative method to my 
attention after I had already completed the 
factor analyses. 


490 Rocer V. Burton 
TABLE 5 
GUTTMAN SIMPLEX OF SUMMED SCORES 
Summed scores 
oes E I c A B D F H G 
TE. Home (1 test) (240) 400 234 172 193 — 142-010 —.015 
I. Lying (1 test) 400 | (836) 108 350 248 256 — .230 .132 —.004 
C. Peeping (3 tests) 234 | 108 (721) 400 374 300 100 .200 250 
A. Copying (3 tests) .172 | .350 400. (871) 450 400 288 «143 118 
B. Speed (6 tests) 193 fohos | 374) SAs (825). U 425.345 173 169 
D. Faking (3 tests) — .256 300.400 425 (.750) 300 346 122 
F. Athletic (4 tests) 142 | 230 6100288345300 (.772) .283 118 
H. Stealing (1 test) | —010 132.200. 143 173 346 283 — 210 
G. Parties (3 tests) | —015 004 250 118 169 122 118 .210 — 
Total 1,116" 1.720 1.966 2.321 2.377 2449" 1.806 1.477 968 
^ These totals are minus any contributions from rpg. 

would itself tend to introduce errors speed, and faking—contribute posi- 
into the ordering. It will be seen that tively to all the rotated factors. The 
the matrix within the heavy lines lying, peeping, and athletic tests 


approaches a perfect simplex. These 
are the same tests we have chosen 
for the principal component analyses 
on the basis of acceptable reliabilities. 
Our results again support the gener- 
ality hypothesis. 

A comparison of the ordering of the 
same tests utilized in the factor 
analysis shows that only the tests in 
the center of the simplex—copying, 


contribute only to one factor and are 
seen to be at the extremes of the sim- 
plex. It is interesting that the posi- 
tions of the two extreme tests in the 
two simplex orderings correspond to 
the two specific, rotated factors; i.e., 
the athletic tests define Factor II’ 
for both the rotational analyses and 
are also at the far right in both 
simplexes, and the lying tests define 


TABLE 6 
GUTTMAN SIMPLEX OF AVERAGE Cross-CoRRELATIONS 


Average cross —r 


Tests 
E c I A B D F H 

E. Home Oa a A jcc 1 T O0 
C, Peeping .187 .285 219 196 062 160 
I. Lying «400 .312 «254 «208 —.003 +132 
A, Copying 154 (.696) 292 291 198 127 
B. Speed 141 .292 (.440) 1255 194 128 
D. Faking a 291 255 (.500) 184 +283 
F. Athletic .087 198 194 184 (458) +162 
H. Stealing —.010 -100 132 127 128 283 162 = 

Total 959% 1.270 1.464 1,659 1,483 1.4172 .884 .982 


a These totals are minus any contributions from rpg. 
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the specific Factor III’ for the summed 
scores and are at the extreme left 
for this simplex, whereas the peeping 
tests have these characteristics for 
the averaged cross-correlation matrix. 
It is also notable that these extreme 
tests contribute the only errors in the 
simplex. 


Related Studies 


Other investigators have come to 
similar Conclusions regarding the gen- 
erality of honest behavior. One of 
the first after Hartshorne and May’s 
final volume was a short paper pub- 
lished by Maller (1934) who had been 
a co-author in Studies in Service and 
Self-Control (Hartshorne et al., 1929). 
Maller analyzed the correlations of 
the summary scores for the character 
tests of Honesty, Cooperation, Inhi- 
bition, and Persistence as reported in 
Studies in the Organization of Character 
(Hartshorne et al., 1930). He utilized 
Spearman’s tetrad difference tech- 
nique and concluded that there was 
evidence for a common factor in all 
three matrices which were based on 
quite different populations. He inter- 
preted the common factor as being 
delay of gratification: “the readiness 
to forego an immediate gain for the 
sake of a remote but greater gain.” 
He pointed out, however, that one 
should be cautious in accepting his 
analysis as proof of a general factor 
due to the very low magnitude of the 
original intercorrelations. On the 
other hand, he predicted that higher 
correlations would be forthcoming 
when character tests were constructed 
with greater reliability and validity. 

Brogden (1940) also utilized the 
factor analytic model in his analysis 
of 40 character tests. The inter- 
correlations were based on a sample 
of 100 middle-class boys with average 
or above IQ. Four of the tests were 
the same or similar to the Hartshorne 
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and May (1928) tests of deceit and 
six were the same paper-and-pencil 
character tests used in Studies in the 
Organization of Character (Hartshorne 
etal., 1930). The most clearly defined 
factor obtained in this analysis was 
an honesty factor. All the behavioral 
tests of cheating had high loadings on 
this factor, and two of the paper-and- 
pencil tests contributed to a small 
extent. Brogden suggested, as did 
Maller, that the paper-and-pencil 
tests could be refined to correlate 
more highly with the honesty factor 
by doing an item analysis on two 
groups of subjects with extreme scores 
on the factor; but this analysis is not 
reported. Brogden also found an 
“acceptance of the moral code” factor. 
It is interesting that these two factors 
are orthogonal to one another in this 
analysis. The honesty factor consists 
mainly of the behavioral tests whereas 
the “moral code” factor is defined 
by paper-and-pencil tests and story 
completions which measured how 
much the child would express the 
socially acceptable (desirable) re- 
sponse. These results indicate that 
even though there are some paper- 
and-pencil tests which contribute to a 
behavioral factor of honesty the ele- 
ments in them are not well determined 
and that the cognitive aspect of 
morality seems for the most part to 
be independent of the behavioral 
choice situation. 

Barbu (1951) reports a program of 
research dealing with honesty in 
children which he conducted in Rou- 
mania between 1935 and 1940. In 
one study of 250 14-year-old boys 
tested with nine behavioral tests and 
one questionnaire test of honesty, he 
found an average intercorrelation of 
.456 and concluded there was strong 
evidence for a general trait of honesty. 
He also analyzed his data using 
Thurstone’s multiple factor model 
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and found evidence of a general factor. 
To some extent, however, the con- 
sistency of Barbu's results may be due 
to his choice of tests which are all 
similar to the Hartshorne and May 
classroom tests which had the high 
loadings on the general factor in our 
analysis. 


Discussion 


The results of our analyses, and 
those of Maller, Brogden, and Barbu, 
lead us to reconsider the specificity 
hypothesis regarding behavioral hon- 
esty in favor of a more general 
position. 

Previous writers have also given 
theoretical consideration to this ques- 
tion and decided in favor of a general- 
ity of behavior. Allport (1937) pre- 
sented challenging arguments against 
the specificity position pointing out 
the difficulties involved in predicting 
the important “identical elements” 
in different situations. Eysenck’s 
(1953) review of Hartshorne and May 
points out the intercorrelations of 
these types of measures, each based 
on from one to six behavioral tests, 
should not be expected to reach the 
magnitudes of intercorrelations based 
on intelligence tests composed of 50 
or more items. This consideration 
does make Hartshorne and May’s 
criterion of a theoretical predictive 
reliability of .90 for acceptable evi- 
dence of a generality of honesty be- 
havior quite stringent. The obtained 
theoretical predictive reliability for 
their tests was .725 based on an 
average intercorrelation for the nine 
types of tests of .227, which included 
the tests with the very low reliabili- 
ties. A battery of 31 such honesty 
tests would be required to obtain the 
theoretical criterion of .90, assuming 
that the average inter-r remains the 
same. By eliminating the tests with 
low reliabilities, the average inter-r 
increases to .305, which still gives 
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just .725 as the theoretical predictive 
reliability for the remaining 6 tests. 
However, only 21 tests would be 
necessary to reach the criterion of .90 
reliability with another battery of 21 
similar honesty tests.® 

The conclusion to draw from these 
analyses is not greatly different from 
that made by Hartshorne and May, 
but the strong emphasis on lack of 
relation between tests is removed. 
Our analyses indicate that one may 
conclude there is an underlying trait 
of honesty which a person brings with 
him to a resistance to temptation 
situation. However, these results 
strongly agree with Hartshorne and 
May’s rejection of an “all or none” 
formulation regarding a person’s char- 
acter. I feel the results can best be 
incorporated by a learning model 
which would predict a generalization 
gradient over the different types of 
tests. Since all the cheating tests 
have much face validity of being in the 
area of resistance to temptation, one 
would expect that the generalization 
gradient would extend to all the tests. 
This expectation is supported by the 
evidence of a general factor under- 
lying the intercorrelations of the tests. 
However, the model would also pre- 
dict that as the tests become less 
similar the probability of the same 
response in both situations would be- 
come less and less. This prediction 
would account for the decrement in 
the magnitude of the correlations as 
the situations become more dissimilar. 

This model would make some addi- 
tional predictions about consistency 
in responses over the different tests 
contingent on different learning condi- 
tions of the subjects. It would predict 
that the parent who consistently de- 
fines all temptation situations the 


$ For fuller discussions of the measurement 
problems and theory on which these computa- 
tions are based, see Hartshorne and May 
(1928) and Eysenck (1953). 
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same way as interpreted in the hon- 
esty tests and also consistently ad- 
ministers positive reinforcement for 
honest behavior and punishment for 
dishonest behavior would facilitate 
for his child the discrimination of the 
critical cues in situations which call 
for an honest response. With these 
critical cues discriminated, the child 
should show much generality in his 
behavior across the different types of 
honesty tests. On the other hand, 
parents who define cheating in one 
situation as being unacceptable, e.g., 
stealing money, but do not censure 
cheating in another situation in which 
a highly valued gain may be obtained, 
e.g., cheating on a college entrance 
examination, would produce children 
who are not consistent on these 
honesty tests.? These children may 
learn to be honest in particular situa- 
tions but would not learn to dis- 

criminate the critical elements calling 
* for an honest response in any situation 
which involves a moral choice. But 
I must emphasize I mean here con- 
sistency in defining the situation for 
the child as calling for an honest 
response and in administering positive 
or negative reinforcement contingent 
on his response. For we would expect 
from the experimental literature that 
once a particular response is well 
established, a kind of inconsistency in 
the predictability of the reinforcer will 
flatten out the generalization gra- 
dient (Humphreys, 1939; Wickens, 
Schroder, & Snide, 1954). Such a 
variable reinforcement schedule will 
also increase the resistance of the 


9 There is the assumption made in this area 
of research that the “average” parent in our 
American culture agrees on what is honest 
and dishonest behavior in the test situations. 
There may be some cases in which parents 
are slightly psychopathic and would not con- 
sider it wrong to cheat; but in sampling a 
large group of subjects, these deviant cases 
should not contribute too much “noise” to 
the analyses. 
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response to extinction, i.e., conditions 
of no longer being reinforced. Thus 
parental consistency in interpreting 
the moral elements of a situation and 
in the positive or negative character- 
istic of the reinforcement they dis- 
pense depending on the child’s be- 
havior, combined with a gradual 
inconsistency in their dispensing of 
such reinforcement are the conditions 
maximizing the learning, generaliza- 
tion, and persistence of a moral 
response. 

An important aspect of this general- 
ization model in predicting and ex- 
plaining moral behavior is the part 
played by cognitive mediation. In 
addition to the generalization gra- 
dient in which only the elements of 
the original stimulus complex and 
response are concerned, there is an- 
other gradient involving cognitively 
mediated generalization. This part 
of our model would predict that the 
greater the cognitive, especially ver- 
bal, association between two kinds 
of temptation situations, the greater 
will be the probability of the same 
response being performed in both 
settings. It would be possible, there- 
fore, to place an individual in a test 
situation appearing to be totally new 
in his experience but yet having some 
elements which he would define as a 
temptation conflict. That is, there 
may be very little similarity as far as 
the immediate stimulus complex is 
concerned, but there are elements of 
a cognitive nature by which media- 
tional generalization may occur. 
When both specific stimulus generali- 
zation and cognitive mediation are 
combined, the probability of predict- 
ing behavior from one situation to 
another should be some additive 
function of these two generalization 
gradients.” 

w MacRae (1954) postulated “two dis- 
tinct processes of moral development” which 
are analogous to the two generalization gra- 
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Theoretically, it seems these two 
generalization gradients may be quite 
independent. The child rearing prac- 
tices of some parents may be very 
appropriate for the learning of honest 
behavior in particular settings and 
for the broad generalization of such 
behavior to other similar stimulus 
complexes. But these same parents 
may not apply verbal labels to such 
situations. Their children are learn- 
ing to be honest in specific situations, 
and any generalization of their be- 
havior will come through similarity 
of new situations to these specific 
learning conditions. Other parents 
may emphasize the verbal labeling 
of situational conditions so that their 
children learn to discriminate certain 
cognitive elements in quite different 
kinds of stimulus situations." These 
children are learning that under cer- 
tain abstract conditions one should 
act in some ways and not in others. 
However, some of these same parents 
may not be efficient in teaching 
their children to perform the desir- 
able response under these conditions. 
These children would know the ac- 
ceptable moral code in many temp- 
tation situations, but such knowledge 
would not necessarily determine their 
overt choice behavior.2 In actual 


dients proposed here. Although his data were 
all based on what we have called the “cogni- 
tive” type of measure, he hypothesized a 
“cognitive” moral development, involving 
the learning of what behavior patterns are 
approved and disapproved, and “emo- 
tional” moral development, including the 
association of anxiety with one’s own 
deviance and moral indignation with that 

of others [p. 17]. 

1 Although this notion seems to be popular 
now, I believe John W. M. Whiting first 
suggested this hypothesis regarding the 
importance of verbal labeling in learning 
moral standards. 

12 More extended discussions of the child 
rearing practices considered conducive for 
learning resistance to temptation and honest 
behavior are in Burton (1959), and Burton, 
Maccoby, and Allinsmith (1961). 
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practice it seems from our analyses 
that the majority of parents, espec- 
ially of the middle class on which 
most research in honesty has been 
done, attempt to achieve both kinds 
of generalization in their children. 
Experimental results also indicate 
the influence of these two kinds of 
generalization on behavior. The gen- 
eralization of specific stimulus situa- 
tions is well demonstrated in the 
literature (e.g., Osgood, 1953). But 
experiments in which both the specific, 
external stimuli and cognitive ele- 
ments can be simultaneously assessed 
for their relative contributions to 
generalization are somewhat rare. 
Hull (1920) demonstrated that a 
concept can be learned even though 
the critical cues might not be subject 
to conscious awareness, indicating the 
discrimination and generalization of 
very specific cues embedded in a 
conceptual task without the need 
of verbal mediation. Bugelski and 
Scharlock (1952) extended this find- 
ing to show that actual verbal media- 
tion can also occur without awareness. 
Other experiments indicate, however, 
that conscious verbal mediation facili- 
tates discriminations in new situations 
where the verbal labels are still 
relevant (Goodwin & Lawrence, 1955; 
Kendler & Kendler, 1959; Kuenne, 
1946). Such discrimination would in- 
crease the probability of generalizing 
to different situations in which these 
labels continue to be appropriate. 
The differences in types of discrimina- 
tion and generalization associated 
with age and intelligence further 
demonstrate the distinction between 
the dimensions of purely external 
stimulus elements and cognitive label- 
ing (Kendler, Kendler, & Wells, 1960; 
Kuenne, 1946; Luria, 1957). 
Relating this experimental evidence 
to the honesty data, the analyses of 
Maller (1934) and Brogden (1940) 
indicate that to some extent most 
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parents are inculcating both generali- 
zation of a specific, situational sort 
and also of a more cognitive kind. 
The ordering of the Hartshorne and 
May tests into a quasi-simplex also 
indicates that the students were 
defining these different tests as being 
in the same realm. The further fact 
that those tests which could be ordered 
into a nearly perfect simplex clearly 
had stimulus elements in common 
shows that both kinds of generaliza- 
tion appear to be influencing these 
results. Brogden’s finding two factors 
which seem to measure these kinds of 
generalization, and the rotational 
analysis of the summed scores, pro- 
vide some empirical support for the 
behavioral and cognitive dimensions 
contributing separate variance to 
these honesty tests. More recent 
investigations also indicate that be- 
havioral measures and cognitive in- 
dices of morality are not necessarily 
correlated (Burton, 1959; Burton 
et al., 1961; Unger, 1960). This 
consideration follows the point made 
by Maccoby (1959) that comparisons 
between different studies of morality 
involve the problem of reliability 
across measures, i.e., their intercorre- 
lations, even though they may be 
highly reliable tests themselves. Thus, 
if one study employs Hartshorne and 
May’s peeping test and another uses 
the lying questionnaire, there will 
probably not be great agreement in 
the results as there is so little overlap 
in the tests. The model we have 
presented indicates that the tests 
may differ on at least two dimensions: 
they may test different environmental 
settings (e.g., in-classroom versus 
out-of-classroom, in-school versus out- 
of-school), and they may differ in the 
extent to which they test actual choice 
behavior in a conflict situation versus 
cognitive structuring of hypothetical 
conditions as in a paper-pencil ques- 
tionnaire or Piaget-type interview. 
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Campbell and Fiske (1959) have sug- 
gested a procedure which appears 
relevant for these issues. Their 
recommendation is to use several 
methods to measure a number of dif- 
ferent traits in order to assess the 
convergent and discriminant validity 
of the tests and of the constructs they 
are purported to measure. The last 
book in the Character Education In- 
quiry of Hartshorne and May (1930) 
Studies in the Organization of Char- 
acter comes close to considering this 
same procedure. The four traits of 
honesty, cooperation, inhibition, and 
persistence were measured by different 
tests utilizing different methods. Fur- 
thermore, they employed the be- 
havioral tests in different settings. 
Unfortunately, the original intercor- 
relations between all the individual 
tests are not reported so that the 
evaluations recommended by Camp- 
bell and Fiske are not possible. 
Hopefully, a study based on a multi- 
trait-multimethod-multisetting design 
employing reliable tests on an ade- 
quate sampling of subjects will be 
done making possible the different 
analytical approaches used in this 
paper as well as those recommended 
by Campbell and Fiske in order to 
elucidate more directly the questions 
we are considering. Other traits 
which would seem relevant for a multi- 
trait design are guilt, achievement 
motivation, rigidity, conformity-com- 
pliance, and social desirability. But 
these are considerations for the future. 

Let us look now at some implica- 
tions of this double-generalization 
model and relate them to some 
research findings. 

Intelligence. Hartshorne and May 
found that IQ was positively cor- 
related with honesty (r = .344). 
Shuttleworth’s analysis (Hartshorne 
et al., 1930) in the last volume showed 
a strong relationship (r = .776) be- 
tween honesty and consistency in 
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behavior, i.e., honest persons tended 
to be consistent in behavior and 
dishonest persons tended to be in- 
consistent. As might be expected 
from these relationships, intelligence 
and consistency of honest behavior 
were also related (r = .226). These 
results are consonant with specula- 
tions from our model that the general- 
ity of honesty would be positively 
related to intelligence. The tempta- 
tion is to end presenting the data 
from their analyses at this point. But 
all the evidence is not so strongly in 
this direction. When honesty is 
partialed out, the relation between 
consistency and intelligence tends 
to disappear. The relation between 
IQ and honesty remains significant 
at .216 even when controlling on 
consistency. As the authors noted, 
they may have partialed out too much 
when they controlled on honesty and 
consistency so that there is prob- 
ably some real association between 
IQ and consistency. But the results 
suggest that intelligence is more 
strongly related to behavioral honesty 
than to consistency. Our model 
would indicate that IQ should be 
especially relevant for tests of knowl- 
edge of a consensual moral code. 
This assumes that conceptual general- 
ization will be positively related to 
IQ. We would expect greater verbal 
mediation from persons with higher 
IQ as they should be more capable 
of abstracting the moral implications 
of the different test situations. Thus, 
at least part of the greater consistency 
of honest persons who tend also to 
have higher IQs may be accounted 
for by the cognitive generalization 
gradient of our model. As we are 
unable to separate the cognitive tests 
from the behavioral tests used by 
Hartshorne and May, we are also 
unable to test our speculations re- 
garding these differences between 
behavioral and cognitive measures. 
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Age. In line with this interpreta- 
tion would be the expectation that 
generality should increase with age. 
Cognitive moral development has 
consistently been positively related 
to age in studies involving children’s 
conceptions of morality (Boehm, 1957; 
Bronfenbrenner, 1962; Durkin, 1959; 
Harrower, 1935; Hoffman, 1961; 
Kohlberg, 1958, 1963; Lerner, 1937; 


MacRae, 1954; Medinnus, 1959; 
Morris, 1958; Peel, 1959; Piaget, 
1932). Experimental results also 


indicate that with age, verbal media- 
tion, and the control such verbaliza- 
tions have*on overt choice behavior 
increase (Kendler et al., 1960; Kuenne, 
1946; Luria, 1957). However, in 
Studies in Deceit (Hartshorne & May, 
1928) age was slightly negatively 
correlated with honesty. The tend- 
ency for a negative correlation of age 
with honesty when honesty is posi- 
tively correlated with consistency is 
not in harmony with our model. In 
this case the further analysis in 
Volume 3 (Hartshorne et al., 1930) 
reveals data supporting our predic- 
tions. For the two groups of children 
studied intensively in this volume it 
was found that both groups became 
more consistent with age but the high 
social class children became more 
honest and those from the lower-class 
school became more dishonest. 

Social class. Kohn (1959) has 
found that there are different value 
systems characteristic of the working 
class and the middle class. The 
working-class parents stress the im- 
mediate implications of a child’s act 
and want the child to stay out of 
trouble by not doing the “wrong” 
thing, whereas the middle-class par- 
ents want their child to understand 
the implications of his behavior sO 
that he chooses to do the “right” 
thing. If these interpretations are 
correct, one would expect that the 
child rearing practices of middle-class 
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parents would be more conducive to 
their children’s discriminating the 
moral implications of different settings 
and performing an honest response 
than would be the child rearing prac- 
tices of the working-class parents. 
These different value systems and 
presumably related child rearing prac- 
tices should produce both greater 
situational generalization and greater 
cognitive generalization in the middle- 
class children than in the working- 
class children. We should expect 
these differences to be reflected in 
greater generality on tests of morality 
of both a behavioral response type 
and a cognitive knowledge kind. 
The findings that behavioral honesty 
was positively correlated with social 
class and that the consistency for the 
upper social class group was signifi- 
cantly greater than that for the lower- 
class group strongly support this 
expectation. Their analysis showing 
age related to consistency with the 
upper social class group increasing in 
consistency faster than the lower-class 
group is also directly in line with our 
model. Researchers using cognitive 
measures of morality have consist- 
ently found their measures related 
to social class (Aronfreed, 1961; 
Boehm, 1957; Bronfenbrenner, 1962; 
Durkin, 1959; Hoffman, 1961; Kohl- 
berg, 1958, 1959, 1963; Lerner, 1937; 
MacRae, 1954), but in general have 
not analyzed their data to test for 
differences in consistency between 
classes. 

Sex. Of the eight types of tests 
used by Hartshorne and May (1928), 
three showed significant, and three 
more nearly significant, differences in 
girls cheating more than boys. They 
attributed these differences to the 
possibility that girls were more moti- 
vated to succeed on a school task and 
to conform to accepted standards 
than were boys. One of these tests 
was the lying test which measured 
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the student's tendency to score him- 
self as conforming to acceptable 
standards. The other tests were more 
directly behavioral in a temptation 
situation. Generally, girls tend to 
be more verbally developed than boys 
(Goodenough, 1954) on intelligence 
and achievement tests, although there 
are exceptions (Bayley, 1957). Girls 
are also rated as being more honest 
than boys (Hartshorne & May, 1928) 
and as developing a conscience earlier 
than boys (Sears, Maccoby, & Levin, 
1957). In light of these findings, we 
might predict from our verbal gen- 
eralization model that girls would be 
more consistent than boys. But there 
is also the evidence that girls tended 
actually to cheat more and also that 
girls tended to make up the “pure” 
(i.e., consistently) deceptive group, 
and boys the pure honest group in 
Hartshorne and May (1928). It 
would seem that the prediction would 
have to be limited to verbal or cogni- 
tive measures of morality, what 
Brogden (1940) called “acceptance of 
the moral code” factor, so that girls 
would evidence greater generality of 
morality only on verbal measures. 
Barbu (1951) reports sex differences 
for areas involved in a questionnaire 
lying test. Boys lied more than girls 
about power and courage, whereas 
girls lied more about being morally 
good. One interpretation for these 
differences is that they reflect real 
differences in behavior, i.e., boys’ 
behavior is more courageous and 
assertive than girls’, whereas girls’ 
behavior does conform to “goodness” 
more than boys’ behavior. But it is 
also possible that the cognitive meas- 
ures of acceptance of the moral code 
are mainly addressed to the areas of 
morality which are more salient to 
girls who therefore are more motivated 
than boys to distort their responses. 
If this were so, girls would tend to 
appear more moral than boys on such 
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tests which are not measuring lying 
or distortion but only cognitive accept- 
ance of morality. Aronfreed (1961) 
has shown that girls appear to be 
much more concerned with display 
of being “good” than are boys. Be 
this as it may, why is it that boys 
should tend to be more honest and 
more consistent on the Hartshorne 
and May temptation tests but girls 
tend to appear more conforming to 
the general moral code as measured 
by verbal tests or ratings by parents 
or teachers? The model we are pro- 
posing would lead us to investigate 
the possibility of differential child 
rearing practices contingent on the 
sex of the child and of differential 
role modeling by the parents." 


F 12 It is suggested here that direct methods 
of observation or experimental designs in the 
home or in natural situations be used to 
obtain measures of child rearing. It seems 
that the important differences in child rearing 
for boys and girls, especially at very young 
ages, may be too subtle to be measured by 
interview techniques. 
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This paper presents alternatives to Mowrer’s concepts of fear, hope, re- 
lief, and disappointment. The 4 concepts which are presented are not 
defined as increments or decrements in the fear response (as in Mowrer) 
but are developed within the framework of a motive-expectancy-value 
model. Fear motivation is defined as motivation to avoid a negative 
incentive or punishment, hope motivation as motivation to approach a 
positive incentive or reward, motivational relief as reduction in fear 
motivation following nonconfirmation or partial confirmation of an ex- 
pectation of punishment, and motivational disappointment as reduction 
in hope motivation following nonconfirmation or partial confirmation of 
an expectation of reward. This alternative conceptualization makes a 
clear distinction between cognitive expectations and hope and fear moti- 
vation in that an expectation is assumed to be a necessary but not a suffi- 


cient condition of motivation. 
ered 


In two recent volumes Mowrer 
(1960a, 1960b) has presented a major 
revision of his original two-factor 
theory. The main aim of the present 
paper is to show how certain basic 
concepts in this revised two-factor 
theory, viz., fear, hope, relief, and 
disappointment, can be reinterpreted 
in terms of a model which involves the 
concepts of motive, expectation, and 
incentive value (Feather, 1959a). It 
will be argued that this model pro- 
vides an alternative conceptualiza- 
tion to Mowrer's theory with differ- 
ential testable implications. 

In his original two-factor theory 
Mowrer (1947) distinguished between 
solution or instrumental learning and 
sign learning or conditioning. Solu- 
tion learning applied to the learning 
of instrumental habits, and sign learn- 
ing to the learning of fears. Mowrer 
assumed that the habits formed in 
solution learning were strengthened 
by reward, that this type of learning 
was mediated by the action of the 


1] am indebted to R. P. McDonald for his 
helpful comments and suggestions about this 
paper. 


Some research implications are consid- 


central nervous system, and that it in- 
volved the skeletal musculature. In 
contrast, the learning of fears (or sign 
learning) was assumed to proceed by a 
principle of contiguity, to be mediated 
by the action of the autonomic nervous 
system, and to involve the glands and 
smooth muscles. This latter type of 
learning permitted an interpretation 
of the effects of punishment on be- 
havior, not in terms of a weakening of 
habits (cf., early Thorndike) but 
rather as involving the conditioning of 
fear. Adjustments of the organism 
were then assumed to be in the direc- 
tion of fear reduction. |Mowrer 
argued that, in passive avoidance learn- 
ing, the fear which was conditioned to 
response-correlated stimuli resulted in 
conflict and, if intense enough, in re- 
sponse inhibition. In active avoidance 
learning the organism was assumed to 
reduce fear elicited by external stimu- 
lation by active avoidance of the 
situation. 

Revised two-factor theory (Mowrer, 
1960a, 1960b) is, in a sense, more uni- 
fied than the original two-factor 
theory in that it no longer involves a 
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distinction between sign learning and 
solution learning. In this new theory 
all learning is sign learning, and solu- 
tion learning is a derivative thereof. 
Mowrer still refers to his theory as 
two-factored, however, because he 
assumes that there are two types of re- 
inforcement, viz., incremental or drive 
induction (the type of reinforcement 
involved in punishment) and decre- 
mental or drive reduction (the type of 
reinforcement involved in reward). 
The focus of the revised theory is on 
the learning of emotions. Mowrer'’s in- 
terest in the conditioning of emotions 
is now broadened, and conditioned 
hopes and fears become the basic con- 
cepts in discussing the effects of re- 
wards and punishments on behavior. 
The emotions of relief and disappoint- 
ment also play important roles in the 
revised theory. 

The distinctions made by Mowrer 
between his concepts of fear, hope, re- 
lief, and disappointment are perhaps 
best represented in terms of a con- 
ditioning situation which involves a 
signal or conditioned stimulus (CS) 
and a shock or unconditioned stimu- 
lus (UCS). Shock onset is assumed to 
elicit pain and the emotional response 
of fear. After frequent pairings in 
which the CS overlaps the onset of 
shock, the signal is converted into a 
“danger signal” and the subject is able 
to react with fear before shock onset, 
i.e., fear becomes anticipatory. The 
emotional response called “relief” 
occurs when the danger signal is termi- 
nated and there is no shock onset. 
Relief corresponds to reduction in the 
conditioned fear, and is referred to by 
Mowrer as secondary reinforcement 
Type 1. It is apparent that both fear 
and relief are closely associated with 
the stage of conditioning involving 
shock onset. In contrast, both hope 
and disappointment are more closely 
associated with the stage of condition- 
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ing involving shock offset. Shock 
offset is accompanied by a subsiding of 
the emotional upset, or fear reduction. 
With frequent pairings in which the 
CS overlaps shock offset, the signal is 
converted into a “safety signal” and 
the subject is able to react with re- 
duction in fear before the offset of the 
shock, i.e., fear reduction becomes 
anticipatory. Mowrer calls this an- 
ticipatory response “hope,” and main- 
tains that hope is the basis of second- 
ary reinforcement Type 2. Disap- 
pointment corresponds to a recrudes- 
cence of fear (or helplessness) when 
the safety signal is terminated and 
there is no shock offset. In summary, 
fear is elicited when the danger signal 
is on, and relief occurs when the danger 
signal is removed. Hope is elicited 
when the safety signal is on, and dis- 
appointment occurs when the safety 
signal is removed. In the condition- 
ing paradigm considered, fear and re- 
lief, and hope and disappointment, re- 
late to different stages in the temporal 
sequence of events, viz., shock onset 
and shock offset, respectively. These 
similarities and differences in the four 
emotions, in relation to the condition- 
ing paradigm, are presented in Table 1. 

It is apparent from Table 1 that 
secondary decremental reinforcement, 
or reduction in learned fear, is com- 
mon to the emotions of relief and hope. 
In contrast, secondary incremental re- 
inforcement, or increase in learned 
fear, is common to the emotions of fear 
and disappointment. Mowrer's re- 
vised two-factor theory is, therefore, 
basically fear-centered. Each of the 
four emotions involves a change in 
the strength of fear. In the case of 
fear and hope this change is anticipa- 
tory whereas for relief and disappoint- 
ment the change is not anticipatory. 
The analysis also indicates that while 
fear and relief may occur in the ab- 
sence of the primary, noxious event, 
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TABLE 1 
MOWRER’S CONCEPTS OF FEAR, HOPE, RELIEF, AND DISAPPOINTMENT 
IN RELATION TO A CONDITIONING PARADIGM 
Danger signal prior to shock onset Safety signal prior to shock offset 
Fear Hope 
Onset of Increase in learned fear Decrease in learned fear 
signal Secondary punishment, Type 1 Secondary reinforcement, Type 2 
Anticipatory Anticipatory 
Relief Disappointment 
Offset of Decrease in learned fear Increase in learned fear 
signal Secondary reinforcement, Type 1 Secondary punishment, Type 2 


Not anticipatory 


Not anticipatory 


i.e., shock, both hope and disappoint- 
ment imply the presence of the pri- 
mary noxious event. Finally, we 
should note that Mowrer extends the 
above form of analysis to situations 
involving the appetitional drives of 
hunger and thirst, where hunger fear 
and thirst fear are assumed to be im- 
portant variables. 

As in the original two-factor theory, 
fear is assumed to mediate passive 
avoidance learning when conditioned 
to response-correlated stimuli and 
active avoidance learning when con- 
ditioned to independent, external 
stimuli. Hope, or anticipated fear re- 
duction, is now regarded as the basis 
of habit. Mowrer argues that when 
hope is conditioned to response- 
correlated stimuli it “feeds back” to 
facilitate the response. When hope is 
conditioned to independent, external 
stimuli, it is assumed to mediate ap- 
proach behavior. Revised two-factor 
theory therefore accounts for the 
effects of rewards and punishments on 
behavior not in terms of a strengthen- 
ing or weakening of associations, but 
rather as involving the facilitating and 
inhibiting effects on responses of con- 
ditioned hope and conditioned fear. 
As stated by Estes (1962), “. . . the 
overt behavior is appropriately modi- 
fied by the type of emotion it leads to, 
the organism tending to continue be- 


haviors that give rise to hope and de- 
sist from those that give rise to fear 
[p. 118].” 

Mowrer (1960b, p. 320) considers 
that revised two-factor theory has 
perhaps its ‘‘closest approximation” 
in Tolman’s conception of learning 
(Tolman, 1932) which emphasizes the 
acquisition of sign-gestalts or expecta- 
tions. In fact, Mowrer (1960b) at 
times appears to identify hopes and 
fears with expectations. Thus he 
states, “In two-factor theory, ‘expec- 
tations’ are of two major varieties: 
hopes and fears, representing, re- 
spectively, anticipations of good and 
bad events (significates) to come [pp. 
325-326 ].” But it is important to 
note that Mowrer’s “expectations” 
are rather different from Tolman’s. 
Tolman’s concept of an expectation is 
identified as an hypothesis about the 
implications of action in a situation, 
as a cognitive anticipation of ‘‘what 
leads to what” if a particular course of 
action is taken. The cognitive or rep- 
resentational aspect of meaning (as 
distinct from the evaluative aspect) is 
discussed by Mowrer (1960a) in his 
second volume. There he introduces 
the concept of image which is defined 
as a conditioned sensation and which 
is used as the important basis for the 
cognitive and mnemonic aspects of 
learning. Mowrer’s discussion sug- 
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gests that he would identify a cogni- 
tive expectation, in Tolman’s sense, 
asan image. It is apparent, however, 
that he is unhappy with the concept of 
a purely cognitive expectation since 
he believes that it leaves unanswered 
the important question of the relation- 
ship of actual behavior to cognition. 
Thus he repeatedly cites Guthrie’s 
(1952) criticism of Tolman’s learning 
theory, viz., ‘In his concern with what 
goes on in the rat's mind, Tolman has 
neglected to predict what the rat will 
do. So far as the theory is concerned 
the rat is left buried in thought . . . 
[p. 143].” 

Mowrer’s solution to this problem is 
to consider expectations as having 
both dynamic (emotional) and cogni- 
tive (imaginal) aspects. Such a con- 
ceptualization, he claims, avoids the 
difficulty inherent in Tolman’s analy- 
sis, which may have arisen from a false 
antithesis between intellect and emo- 
tion. For Mowrer (1960a) this anti- 
thesis is unjustified for, in thinking, 
affective, and cognitive components 
are assumed to be interwoven. His 
argument is perhaps clearest in his 
discussion of vicarious trial and error 
behavior, 


we must be careful not to leave the rat at 
the choice point ‘Jost in thought.” We 
must somehow get him “going” again, and 
eventually to his goal. If, in thought, we are 
merely dealing with expectancies in the sense 
of “pure cognitions,” there is an acute problem 
here. But if, instead, we view these expec- 
tancies more dynamically (as hopes and 
fears), then we have a basis for expecting 
thought to be closely related to, and to eventu- 
ate in, overt motion [p. 216]. 


It may be possible, however, to re- 
tain the concept of a purely cognitive 
expectation and to incorporate it into 
a theoretical model which avoids the 
criticism by Guthrie that the organism 
is left “lost in thought.” In one of his 
last papers Tolman (1955) attempted 
to do just this by specifying concepts 
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of need, expectation, and valence 
which were assumed to interact to 
determine performance. Tolman’s 
model is one of a class involving con- 
cepts akin to motive, expectation, and 
incentive value. Other such models 
have been developed by Lewin, 
Dembo, Festinger, and Sears (1944) 
in the analysis of level of aspiration 
behavior, by Rotter (1954) in his 
social learning theory, by subjectively 
expected utility (SEU) theorists in 
the analysis of decision making (Ed- 
wards, 1954), and by Atkinson (1957) 
in discussion of achievement motiva- 
tion? The similarities and differ- 
ences between these models have 
been summarized in a recent paper 
(Feather, 1959a). The existence of 
this class of models suggests the pos- 
sibility of an alternative approach to 
the conceptualization of fear, hope, 
relief, and disappointment, and it is to 
such an alternative that we now turn. 


An ALTERNATIVE CON- 
CEPTUALIZATION 


We will now present as an alterna- 
tive to Mowrer's interpretation of fear, 
hope, relief, and disappointment, a 
theory which employs the concepts of 
fear motivation and hope motivation. 
These two concepts are not identified 
as the emotional responses of fear and 
hope, respectively. Rather they are to 
be considered as theoretical concepts 
which may be developed within a class 
of models involving the concepts of 
motive, expectation, and incentive 
value (Feather, 1959a). However, we 
would expect measures of fear motiva- 
tion and hope motivation to correlate 


2 The level of aspiration model, the social 
learning model, and the SEU decision model 
do not explicitly include a concept of motive 
but Atkinson (1958, p. 305) argues that con- 
cepts such as valence, reinforcement value, 
and utility can be conceived as the multiplica- 
tive combination of motive and incentive 
value. 
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positively with measures of the emo- 
tional responses of fear and hope, 
respectively. 

The particular motive-expectancy- 
value model presented in this paper is 
based on Atkinson's (1957) theory of 
achievement motivation, and a ver- 
sion of it, applied to the achievement 
context, has been used by the writer 
in the analysis of persistence (Feather, 
1961, 1962). Motive is conceived as a 
relatively stable personality disposi- 
tion which may, in some cases, have 
an innate basis (cf., Eysenck’s, 1957, 
concept of a predisposition to emo- 
tionality), but which is more likely the 
product of early learning, possibly 
according to principles formulated 
by McClelland (1951, pp. 441-475). 
More specifically, motives are con- 
ceived as dispositions within the per- 
son to approach certain classes of ob- 
jects or events and to avoid certain 
other classes of objects or events. The 
objects or events which are ap- 
proached are called positive incentives 
or rewards; the objects or events which 
are avoided are called negative incen- 
tives or punishments. Expectations 
and incentive values are assumed to be 
more closely related to aspects of a 
situation. An expectation is conceived 
as a cognition about the consequences 
of behavior in a situation, a sign- 
significate relationship which captures 
the idea of “what leads to what.” Its 
strength may be indexed in terms of a 
subjective probability about the oc- 
currence of the consequence, given the 
act (Rotter, 1954; Atkinson, 1957). 
This concept of expectation has been 
formalized as an S)-R)-S» representa- 
tion by MacCorquodale and Meehl 
(1953) and recent studies have in- 
vestigated some of the factors which 
influence its strength (cf., Feather, 
1963). The value of an incentive is 
assumed to be related to qualitative 
and quantitative aspects of a reward 


or punishment (e.g., amount of food, 
palatability of food, intensity and/or 
duration of shock). In most sit- 
uations incentive values are prob- 
ably independent of expectations. 
In an achievement situation, however, 
where performance is evaluated against 
standards of excellence, incentive 
values of success and failure are re- 
lated to subjective probability of suc- 
cess (Feather, 1959b). 

The above concepts are in no way 
teleological. Each refers to a present 
condition, and measures of the strength 
of motives, expectations, and incen- 
tive values can be, and have been, de- 
veloped. Atkinson and Litwin (1960), 
for example, have examined the con- 
struct validity of the Test of Insight 
(French, 1958) and the Test Anxiety 
Scale (Mandler & Sarason, 1952) as 
measures of the strength of the mo- 
tives to achieve success and to avoid 
failure respectively. Two recent vol- 
umes edited by Lindzey (1958) and 
by Atkinson (1958) have considered 
problems in the assessment of human 
motives. A variety of papers con- 
cerned with the measurement of sub- 
jective probability are now available, 
some of which present direct methods 
of measurement (e.g., Adams & 
Adams, 1961; Feather, 1963; Gal- 
anter, 1962), while others employ 
more indirect approaches (e.g., Becker, 
1962; Edwards, 1962). Finally, there 
is an increasing literature, particu- 
larly from decision theory, on the 
measurement of incentive values or 
utilities (e.g., Becker, 1962; Edwards, 
1954; Galanter, 1962). 

Motives, expectations, and incen- 
tive values are assumed to combine 
(perhaps multiplicatively) to deter- 
mine motivation either to approach a 
positive incentive or to avoid a nega- 
tive incentive. For example, in 
Atkinson’s (1957) theory of achieve- 
ment motivation, the motivation to 
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achieve success is taken as the multi- 
plicative combination of motive to 
achieve success, subjective proba- 
bility (or expectation) or success, and 
positive incentive value of success, and 
the motivation to avoid failure is 
taken as the multiplicative combina- 
tion of motive to avoid failure, sub- 
jective probability (or expectation) of 
failure, and negative incentive value 
of failure. 

In the motive-expectancy-value 
model we identify hope motivation as 
motivation to approach a positive in- 
centive or reward. Hope motivation 
is therefore not equated with expecta- 
tion of a reward nor is it considered in 
terms of fear reduction. Rather, ex- 
pectation of reward is taken as a 
necessary but not a sufficient condi- 
tion of hope motivation. Hope mo- 
tivation, in the sense of motivation to 
approach a reward, is also assumed to 
depend upon the strength of the rele- 
vant motive and the magnitude of the 
positive incentive value. In the case 
of a hungry child, for example, who 
has learned to obtain cookies from a 
jar on the pantry shelf, strength of 
hope motivation (and its emotional 
correlate of hope) would depend on 
the strength of the child’s motive to 
approach food (a relatively stable 
personality disposition related to early 
learning), the degree to which he ex- 
pects to find the cookies in the jar (an 
expectation based on past experience 
and influenced by the present situa- 
tion), and the positive incentive value 
of the food (related to the number and 
quality of the cookies). 

In a corresponding way, fear motiva- 
tion is identified as motivation to avoid 
a negative incentive or punishment. 
In the conditioning paradigm con- 
sidered previously, we would argue 
that the amount of fear motivation 
(and its emotional correlate of fear), 
which is elicited with onset of the 
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danger signal, will depend on the de- 
gree to which the punishment is ex- 
pected. But, we would further main- 
tain that the amount of aroused fear 
motivation is also a function of the 
intensity of the shock and its duration 
(aspects of the punishment which 
should affect its negative incentive 
value), and of a relatively stable dis- 
position (or motive) to avoid the 
punishment. 

Provided expectations and incen- 
tive values are independent, confirma- 
tion of an expectation of reward 
should lead to an increase in the 
strength of the expectation and, hence, 
to an increase in hope motivation. 
For example, if the hungry child 
found cookies in the jar, his expecta- 
tion would be confirmed and strength- 
ened. When he is again hungry and 
decides to get the cookies, hope moti- 
vation would be stronger due to the 
increased strength of the expectation 
of reward. In a corresponding way, 
provided expectations and incentive 
values are independent, confirmation 
of an expectation of punishment 
should determine an increase in fear 
motivation. The repeated occurrence 
of shock following the danger signal 
would lead to an increase in the 
strength of the expectation of punish- 
ment and, hence, to a higher level of 
fear motivation when the danger sig- 
nal is presented. 

Increases in hope and fear motiva- 
tion may also occur when expectations 
are “‘overconfirmed” in the sense that 
there is an unexpected increase in the 
quality or amount of the reward or 
punishment. Here the increase in 
motivation appears to be determined 
mainly by the increase in incentive 
value although, since expectations 
which are overconfirmed are presum- 
ably also strengthened, increases in 
motivation may also be determined by 
the strengthened expectations. One 
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aspect of Crespi’s (1942) classic in- 
vestigation indicates the improvement 
in performance which follows a sudden 
increase in the amount of reward. His 
interpretation of this effect (Crespi, 
1944), in terms of an increase in 
“emotional drive” or “eagerness,” is 
analogous to the increase in hope 
motivation which, in the present 
model, is assumed to follow overcon- 
firmation of a reward. 

Relief and disappointment may also 
be conceptualized in terms of the 
motive-expectancy-value model Mo- 
tivational relief is assumed to occur 
when nonconfirmation of an expecta- 
tion of punishment determines a de- 
crease in fear motivation (i.e., a de- 
crease in motivation to avoid the 
punishment). Thus, for example, 
when shock does not follow the danger 
signal, the expectation of punishment 
is not confirmed and fear motivation 
would decrease. Motivational relief 
is identified as this decrease in fear 
motivation, and we would expect 
measures of motivational relief to 
correlate positively with measures of 
relief conceived as an emotional 
response. 

Correspondingly, motivational dis- 
appointment is assumed to occur when 
nonconfirmation of an expectation of 
reward determines a decrease in hope 
motivation (i.e., a decrease in motiva- 
tions to approach the reward). Thus, 
for example, if the hungry child finds 
that there are no cookies in the jar, 
his expectation of reward is not con- 
firmed and hope motivation would de- 
crease. Motivational disappointment 
is identified as this decrease in hope 
motivation, and we would expect 
measures of motivational disappoint- 
ment to correlate positively with 
measures of disappointment conceived 
as an emotional response. 

Provided expectations and incen- 
tive values are independent, repeated 
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nonconfirmation of an expectation of 
punishment should determine con- 
tinued motivational relief and a pro- 
gressive decrease in fear motivation. 
Similarly, provided expectations and 
incentive values are independent, re- 
peated nonconfirmation of an expec- 
tation of reward should determine con- 
tinued motivational disappointment 
and a progressive decrease in hope 
motivation. 

Both motivational relief and moti- 
vational disappointment are therefore 
assumed to follow nonconjfirmation of 
expectations and to involve reduction 
in fear motivation and hope motiva- 
tion, respectively. An expectation of 
reward or punishment is not confirmed 
when the expected reward or punish- 
ment does not eventuate, i.e., there 
is no reward or punishment. Moti- 
vational relief and motivational dis- 
appointment would also occur under 
conditions where partial confirma- 
tion of an expectation leads to a 
decrease in motivation, that is, 
where the expected reward or pun- 
ishment does occur but in reduced 
amount or quality. An expectation 
of reward would be only partially 
confirmed if the amount of the 
expected reward were suddenly re- 
duced (cf., Crespi’s experiment, 1942). 
or if there was a sudden reduction in 
the quality of the expected reward 
(cf., Tinklepaugh’s experiment, 1928). 
Partial confirmation of an expectation 
of reward would determine a decrease 
in hope motivation, i.e., motivational 
disappointment. Similarly, if an ex- 
pectation of punishment were only 
partially confirmed, due to an unex- 
pected decrease in the intensity of 
punishment, there would be a decrease 
in fear motivation, i.e., motivational 
relief. . 

In summary, motivational relief 
and motivational disappointment are 
defined in terms of reduction in fear 
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TABLE 2 
FEAR MOTIVATION, Hore MOTIVATION, MOTI- 
VATIONAL RELIEF, AND MOTIVATIONAL 

DISAPPOINTMENT IN RELATION TO 
CONFIRMATION AND NONCONFIR- 

MATION OF EXPECTATIONS WHEN 

INCENTIVE VALUES AND 
EXPECTATIONS ARE 
INDEPENDENT 


Expectation of | Expectation of 
punishment reward 


Contirmed or Increase in fear | Increase in hope 


overconfirmed motivation motivation 
Not confirmed or | Motivational Motivational 

partially con- reli disappoint- 

firmed ment 


Note.—Strength of motivation is assumed to be 
positively related to strength of motive, level of ex- 
pectation, and magnitude of incentive value. 


motivation and hope motivation, re- 
spectively, where this reduction is de- 
termined by nonconfirmation or par- 
tial confirmation of the corresponding 
expectation. In contrast, provided in- 
centive values and expectations are in- 
dependent, the development of incre- 
ments in fear motivation and hope 
motivation would follow confirmation 
or overconfirmation of expectations. 
The similarities and differences in 
hope motivation, fear motivation, 
motivational disappointment, and mo- 
tivational relief are summarized in 
Table 2 for the more usual case where 
incentive values and expectations are 
assumed to be independent. 

It is apparent from Table 2 that fear 
motivation and motivational relief 
form a pair, and that hope motivation 
and motivational disappointment form 
a pair. It is important to note that, 
unlike Mowrer’s approach, these four 
concepts are not defined as different 
aspects of the fear response. Hope 
motivation is not anticipated fear re- 
duction, nor is motivational disap- 
pointment considered to be a recrudes- 
cence of the emotion of fear. Instead 
the four concepts are explicated within 
the framework of a motive-expec- 
tancy-value model. 
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The preceding discussion has been 
concerned with the more common case 
of the motive-expectancy-value model, 
where incentive values and expecta- 
tions are assumed to be independent. 
It is reasonable to assume, for ex- 
ample, that the positive incentive 
value of food is not related in any sys- 
tematic way to the expectation of 
food, or that the negative incentive 
value of shock is not systematically 
related to the expectation of shock. 
But there are situations where we 
would expect incentive values and ex- 
pectations to be related (cf., Feather, 
1959a). One such situation is the 
achievement situation where a per- 
son's performance at a task can be 
evaluated against standards of ex- 
cellence. Here we would expect the 
positive incentive value of success to 
be greater for success at a difficult 
task (low expectation of success) than 
for success at an easy task (high ex- 
pectation of success), Corresponding- 
ly, we would expect the negative in- 
centive value of failure at a task to be 
greater for failure at an easy task 
(low expectation of failure) than for 
failure at a difficult task (high expecta- 
tion of failure). These dependen- 
cies between incentive values and ex- 
pectations are assumed in the theory 
of achievement motivation (Atkinson, 
1957; Feather, 1962). It is inter- 
esting to consider the concepts of 
hope motivation, fear motivation, 
motivational disappointment, and mo- 
tivational relief with respect to this 
particular motive - expectancy - value 
model. 

In the theory of achievement moti- 
vation “hope for success” motivation 
is taken as the multiplicative com- 
bination of motive to achieve success 
(M,), expectation of success (P.), and 
positive incentive value of success 
(I). Similarly, “fear of failure” mo- 
tivation is taken as the multiplicative 
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combination of motive to avoid fail- 
ure (Mas), expectation of failure (P+), 
and negative incentive value of failure 
(I;). In the model, the positive incen- 
tive value of success is taken as the 
complement of the subjective proba- 
bility of success (i.e., J,=1—P,), and 
the negative incentive value of failure 
is taken as minus the complement of 
the subjective probability of failure 
—i.e., I;=—(1—P,;). Hence the 
theory makes the quite explicit as- 
sumption that incentive values and 
expectations are related. It follows 
from these assumptions that hope for 
success motivation is curvilinearly 
related to subjective probability of 
success (P,), increasing toa maximum 
value as P, increases to .50, and there- 
after decreasing in value as P, further 
increases. Similarly, fear of failure 
motivation is curvilinearly related to 
subjective probability of failure (P,), 
increasing to a maximum value as P; 
increases to .50, and thereafter de- 
creasing in value as P; further in- 
creases, These curvilinear relation- 
ships are indicated in Table 3 for dif- 
ferent motive strengths. 

Table 3 implies that, insofar as the 
theory of achievement motivation is 
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concerned, the occurrence of motiva- 
tional relief or motivational disap- 
pointment will depend upon the 
strength of the corresponding expecta- 
tion. Motivational relief would occur 
when nonconfirmation of a weak ex- 
pectation of failure determines a de- 
crease in fear of failure motivation. 
Let us assume, for example, that a 
person with a weak expectation of fail- 
ure (e.g., P;=.30) succeeds at a task 
(i.e., the expectation is not confirmed). 
This success should determine a de- 
crease in his expectation of failure at 
the task for future attempts (e.g., Ps 
may decrease from .30 to .20). Table 
3 shows that such a decrease in a weak 
expectation of failure (from P;=.30 
to P;=.20) would determine a de- 
crease in fear of failure motivation 
i.e., motivational relief, and that this 
decrease would be greater for a stronger 
motive to avoid failure. 
Correspondingly, motivational dis- 
appointment would occur when non- 
confirmation of a weak expectation of 
success determines a decrease in hope 
for success motivation. Let us as- 
sume, for example, that a person with 
a weak expectation of success (e.g. 
P,=.30) fails at a task (i.e., the expec- 


TABLE 3 


RELATIONSHIPS OF HOPE FOR SUCCESS MOTIVATION AND FEAR OF FAILURE 
MOTIVATION TO EXPECTATIONS OF SUCCESS AND FAILURE 


Hopo kor fucosa Fear of fanua 

E i i otivation when ; y, motivation 

aus E oeU "iae y erare O aeaa 

M.=1 | M =2 Mas =1 | Ma =2 

9 al .09 18 Al tg —.09 | —.18 
8 2 16 .32 7 —.8 —.16 —.32 
7 3 21 42 3 —.1 —,21 — 42 
6 4 24 48 4 216 —.24 | —.48 
5 5 25 50 5 mae —.25 | —.50 
4 6 .24 48 6 aid —.24 | —.48 
3 7 21 42 7 a5 3 —.21 — 42 
2 8 16 32 8 =) —16 | —.32 
1 9 .09 18 x) =.1 —.09 —.18 


Note.—M, = Strength of motive to achieve success; May = Strength of motive to avoid failure; Is =1 = Psi 
Is =— (1 — Py); Hope of success motivation = (M, X P, X I); Fear of failure motivation = (Mas X Ps X I1). 
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tation is not confirmed). This failure 
should determine a decrease in his ex- 
pectation of success at the task for 
future attempts (e.g., P, may de- 
crease from .30 to .20). Table 3 shows 
that such a decrease in a weak ex- 
pectation of success (from P,=.30 to 
P,=.20) would determine a decrease 
in hope for success motivation, i.e., 
motivational disappointment, and 
that this decrease would be greater for 
a stronger motive to achieve success. 
In other words, motivational relief 
would follow success at a task con- 
sidered to be easy, and motivational 
disappointment would follow failure 
at a task considered to be difficult. 
But what would happen if a strong 
expectation of failure (Ps>.50) or a 
strong expectation of success (P, >.50) 
were not confirmed? The theory of 
achievement motivation implies that 
this condition would not determine 
motivational relief or motivational 
disappointment but, rather, incre- 
ments in fear of failure and hope for 
success motivation, respectively. For 
example, if a person with a strong ex- 
pectation of failure (e.g., Ps=.70) 
were to succeed at a task, this success 
would determine a decrease in his ex- 
pectation of failure at the task for 
future attempts (e.g, Py may de- 
crease from .70 to .60). Table 3 shows 
that such a decrease in a strong ex- 
pectation of failure (from P;=.70 to 
P;=.60) would determine an increase 
in fear of failure motivation and that 
this increase would be greater fora 
stronger motive to avoid failure. 
Similarly, hope for success motivation 
would increase following noncon- 
firmation of a strong expectation of 
success, the increase being greater for 
a stronger motive to achieve success. 
In other words, an increase in fear of 
failure motivation would follow suc- 
cess at a task considered to be diffi- 
cult, and an increase in hope for suc- 
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cess motivation would follow failure 
at a task considered to be easy. 

Hence, in the theory of achievement 
motivation, which assumes dependen- 
cies between incentive values and ex- 
pectations, nonconfirmation of an ex- 
pectation does not necessarily deter- 
mine motivational relief or motiva- 
tional disappointment. The defini- 
tion of these two concepts involves 
not only nonconfirmation of an ex- 
pectation but also a decrease in the 
corresponding motivation. These 
two requirements are met in the 
theory of achievement motivation 
only when weak expectations are 
not confirmed. By the same to- 
ken, in the theory of achievement 
motivation, confirmation of an ex- 
pectation does not necessarily deter- 
mine increases in hope for success or 
fear of failure motivation. Assuming 
that confirmation of an expectation 
would increase its strength, increases 
in hope for success or fear of fail- 
ure motivation would occur only 
if a weak expectation (ie., P. or 
P; < .50) were confirmed. Confir- 
mation of a strong expectation of suc- 
cess (P, > .50) or a strong expectation 
of failure (Py > .50) would determine 
decreases in hope for success and fear 
of failure motivation, respectively. 
The general rule is that, whenever 
confirmation or nonconfirmation of an 
expectation of success or failure shifts 
the strength of the expectation to- 
wards the intermediate value (i.e., 
P, = P; = .50), then the correspond- 
ing motivation increases. But, when- 
ever confirmation or nonconfirmation 
of an expectation of success or failure 
shifts the strength of the expectation 
away from the intermediate value, 
then the corresponding motivation 
decreases. 

The relationships of fear of failure 
motivation, hope for success motiva- 
tion, motivational relief, and motiva- 
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TABLE 4 


FEAR OF FAILURE MOTIVATION, HOPE FOR Success MOTIVATION, MOTIVATIONAL RELIEF, 
AND MOTIVATIONAL DISAPPOINTMENT IN RELATION TO CONFIRMATION AND 
NONCONFIRMATION OF EXPECTATIONS WHEN INCENTIVE VALUES 
AND EXPECTATIONS ARE RELATED 


Strong expectation 
of failure (Py > .50) 


Decrease in fear of 
failure motiva- 


Confirmed 


tion tion 
Not con- | Increase in fear of | Motivational 
firmed failure motiva- relief 


tion 


Weak expectation 
of failure (Py < .50) 


Increase in fear of | Decrease in hope 
failure motiva- 


Weak expectation 
of success (Ps < .50) 


Strong expectation 
of success (Ps > .50) 


Increase in hope for 
success motiva- 
tion 


for success moti- 
vation 


Motivational dis- 
appointment 


Increase in hope 
for success moti- 
vation 


tidnal disappointment to confirmation 
and nonconfirmation of expectations, 
in the theory of achievement motiva- 
tion, are summarized in Table 4. 

Table 4 is obviously more complex 
than Table 2. This greater complexity 
is an outcome of the dependencies be- 
tween incentive values and expecta- 
tions assumed in the theory of achieve- 
ment motivation which, together with 
the assumption of multiplicative com- 
bination of variables, determine curvi- 
linear relationships between strength 
of motivation and level of expectation. 
In the more general case of the motive- 
expectancy-value model, strength of 
motivation is assumed to be positively 
related to strength of motive, level 
of expectation, and magnitude of in- 
centive value. The variables are 
assumed to be independent and, hence, 
the relationships of fear motivation, 
hope motivation, motivational relief, 
and motivational disappointment to 
confirmation, and nonconfirmation of 
expectations (as summarized in Table 
2) are of a simpler order. 


SOME EXPERIMENTAL EVIDENCE 


An examination of Table 3 indicates 
that the motivational disappointment 
which would occur when a relatively 
weak expectation of success (P, < .50) 
is reduced in strength to a very weak 


expectation of success (e.g., P, = .10) 
by nonconfirmation, would be greater 
when the weak expectation is high in 
value (e.g., P; = .40) rather than low 
in value (e.g., P, = .20), and would 
be greater for a strong motive to 
achieve success (e.g., M, = 2) than 
for a weak motive to achieve success 
(M, = 1). More specifically, we can 
advance the following two hypotheses: 

Hypothesis 1 states that, providing 
expectations of success are relatively 
weak, for a given strength of the mo- 
tive to achieve success, motivational 
disappointment accompanying reduc- 
tion of a weak expectation of success 
to a low value should be positively re- 
lated to the initial expectation of 
success. 

Hypothesis 2 states that, providing 
expectations of success are relatively 
weak, for a given initial expectation of 
success, motivational disappointment 
accompanying reduction of a weak 
expectation of success to a low value 
should be positively related to the 
strength of the motive to achieve 
success. 

Hypotheses 1 and 2 together in- 
volve a more general prediction to 
cover the case where both strength of 
motive and initial expectation of suc- 
cess vary among subjects. This pre- 
diction is as follows: 
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Hypothesis 3 states that, providing 
expectations of success are relatively 
weak, motivational disappointment 
accompanying reduction of a weak ex- 
pectation of success to a low value 
should tend to be positively related 
to strength of initial hope for success 
motivation. 

The predictions in Hypotheses 1 and 
2 both involve this principle since ini- 
tial hope for success motivation would 
be stronger for higher initial expecta- 
tions of success (cf., Hypothesis 1), 
and for stronger motives to achieve 
success (cf., Hypothesis 2). But 
Hypotheses 1 and 2 refer to more con- 
trolled situations where either the 
strength of motive to achieve success 
or the initial expectation of success is 
held constant. Hypothesis 3 is con- 
cerned with the more general situa- 
tion where both strength of motive to 
achieve success and initial expectation 
of success vary among subjects. 

Although the above hypotheses 
have not been explicitly tested, some 
relevant evidence has recently been 
obtained by the writer in an investiga- 
tion of persistence (Feather, 1963). 
Sixty male subjects worked indi- 
vidually at an insoluble, unicursal 
puzzle with the opportunity of turn- 
ing to an alternative puzzle of the 
same type whenever they desired. 
The insoluble puzzle was presented to 
subjects as very difficult. Each sub- 
ject was, in fact, told that only 5% of 
university students were able to solve 
it. The alternative puzzle, which was 
soluble, was presented to the subjects 
as intermediate or average in diffi- 
culty, and each subject was told that 
50% of university students were able 
to solve it. This information about 
the difficulty levels of the two puzzles 
was given to each subject before he be- 
gan to work at the first puzzle. Asa 
check on the effectiveness of the ficti- 
tious group norm procedure, each sub- 
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ject was required to estimate his 
chances of success for each puzzle 
using a rating scale numbered from 
0 to 100 in steps to 10. This probabil- 
ity estimate, obtained before the sub- 
ject commenced the puzzle, is assumed 
to indicate the strength of his initial 
expectation of success (P,). 

For each puzzle the task was to 
trace over all the lines of a diagram 
without lifting the pencil from the 
diagram and without tracing over any 
line twice. Copies of the insoluble 
first puzzle (Item 1), printed on cards, 
were placed in a stack in front of the 
subject. Similarly, copies of.the solu- 
ble second puzzle (Item 2), printed on 
cards, were placed to one side of the 
subject, but he could not see the con- 
tent of Item 2 before he began to work 
at it. Each subject was allowed to 
work at an item for as many trials as 
he chose, taking up to 40 seconds for 
each trial. He could quit Item 1 
whenever he desired and turn to 
Item 2. The measure of persistence 
was the number of trials taken by the 
subject at Item 1 before turning to 
Item 2. 

Following the test of persistence, 
each subject completed a postper- 
formance questionnaire. Among other 
questions, the subject was asked how 
concerned he felt about succeeding at 
Item 1 (i.e., how much he desired to 
succeed), how disappointed he felt at 
failing at Item 1, how anxious or 
worried he felt about his performance 
at Item 1, and how annoyed he felt 
about his lack of progress at Item 1. 
These questions were presented in 
Likert form and required answers on 
a five-category scale which was scored 
from one to five in the direction of in- 
creasing strength of the feeling. To 
check on changes in expectation of 
success with repeated failure at Item 
I, each subject was also asked what 
he estimated his chances of success to 
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be after he had completed about half 
the number of trials he took at Item 1, 
and also just before he finished work- 
ing at this item. These “middle” and 
“final” probability estimates were ob- 
tained using the same rating scale as 
was employed to obtain “initial” 
probability estimates prior to per- 
formance at Item 1. 

Need Achievement scores based 
on stories written to six pictures 
under neutral conditions according to 
the standard procedure (McClelland, 
Atkinson, Clark, & Lowell, 1953) pro- 
vided measures of the strength of 
motive .to achieve success (M,). 
Mandler-Sarason Test Anxiety scores 
provided measures of the strength of 
motive to avoid failure (Mas). Both 
the projective test of n Achievement 
and the Test Anxiety Scale were ad- 
ministered to the subjects some weeks 
prior to the test of persistence. 

The experimental situation de- 
scribed above fulfills the conditions for 
the occurrence of motivational dis- 
appointment according to the theory 
of achievement motivation. Item 1 
was presented to the subjects as very 
difficult and, hence, initial expecta- 
tions of success should tend to be rela- 
tively weak. Furthermore, each trial 
taken by the subject at Item 1 re- 
sulted in failure since the puzzle was 
insoluble. Hence the weak initial 
expectation of success was not con- 
firmed by success and, by assumption, 
this weak expectation should tend to 
decrease in strength with repeated 
failures,’ and determine decreases in 
hope for success motivation. 

Table 5 presents intercorrelations 
between subjects’ ratings of their con- 
cern about succeeding at Item 1, 


3 Analysis of probability estimates reveals 
a tendency for an increase from the initial 
estimate to the middle estimate, followed by a 
decrease to the final estimate. This puzzling 
trend may be a function of the very low norm 
(5%) reported to the subjects. 


N. T. FEATHER 


TABLE 5 


INTERCORRELATIONS OF RATINGS OF ACHIEVE- 
MENT CONCERN, DISAPPOINTMENT ABOUT 
FAILURE, ANXIETY ABOUT FAILURE, 
ANNOYANCE ABOUT FAILURE, AND 
INITIAL PROBABILITY ESTIMATES 


(N = 60) 
Disap- Anno: itia 
r i f; y-| Initial 
point: | Apalety] “ance | proba- 
i abou! ility 
oe failure | failure | estimate 
Concern about 
achievement 52 | .22 <29* 14 
Disappointment 
about failure TOE | 35% .28* 
Anxiety about 
failure 5a | —.03 
Annoyance about 
failure 08 
*p <.05, 
**p <.01. 
** p <.001. 


their disappointment about failure 
at Item 1, their anxiety about 
failure at Item 1, their annoyance 
about failure at Item 1, and their 
chances of success prior to perform- 
ance at Item 1 (initial probability 
estimates). 

Table 5 shows that ratings of 
achievement concern correlate 7=.52 
(p < .001, df=58) with ratings of 
disappointment. If we assume that 
these ratings can be taken as measures 
of hope motivation and motivational 
disappointment, respectively, this re- 
sult is consistent with the predicted 
positive relationship stated in Hypoth- 
esis 3. Table 5 also shows that ratings 
of disappointment correlate r=.28 
(p < .05, df=58) with estimates of 
probability of success obtained prior 
to performance at Item 1. Assuming 
that these ratings can be taken as 
measures of motivational disappoint- 
ment and initial expectation of success 
respectively, this result is consistent 
with the predicted positive relation- 
ship stated in Hypothesis 1. Ratings 
of disappointment are also positively 
correlated with n Achievement scores 
(r=.18) but this correlation is not 
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statistically significant. If we assume 
that these measures can be taken as 
indicating degree of motivational dis- 
appointment and strength of motive 
to achieve success, respectively, this 
lack of a significant positive relation- 
ship fails to support Hypothesis 2. 
These data are therefore consistent 
with the predictions stated in Hy- 
potheses 1 and 3. It is important to 
note, however, that the above ex- 
periment was not specifically designed 
as a test of the three hypotheses con- 
cerning motivational disappointment, 
and offers only suggestive evidence. 
The data contained in Table 5 are 
rather incidental to the main aim of 
the investigation which was to in- 
vestigate differences in persistence.‘ 
However, more rigidly controlled in- 
vestigations of the above hypotheses 
should be possible. One might, for 
example, attempt to control the num- 
ber of failures the subjects undergo at 
the task rather than to allow this to 
vary, as in the present investigation, 
and one might also systematically vary 
initial expectations of success at the 
insoluble task. One might also try to 
devise alternative methods of meas- 
uring motivational disappointment 
and hope for success motivation addi- 
tional to the rather simple rating 
measures used in the above study. In 
future research, it should also be pos- 
sible to investigate predictions about 
motivational relief, in an achievement 
context, which parallel those stated in 
the above hypotheses. For example, 
according to the theory of achieve- 
ment motivation, we would predict 


4 Results indicate that persistence at Item 1 
is positively related to initial estimates of 
probability of success (P,) for the subjects 
classified as high in n Achievement and low in 
Test Anxiety, but there is no relationship be- 
tween persistence and initial estimates of P: 
for the subjects classified as low in n Achieve- 
ment and high in Test Anxiety. This result 
accords with the prediction based on the 
motive-expectancy-value model. 
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that, providing expectations of failure 
are relatively weak, for a given 
strength of the motive to avoid failure, 
motivational relief accompanying re- 
duction of a weak expectation of fail- 
ure to a low value should be positively 
related to the initial expectation of 
failure. Such predictions could be in- 
vestigated in an achievement situa- 
tion where a subject experiences re- 
peated success at an easy task. 

Future research could also study the 
relationship of motivational disap- 
pointment to hope motivation, and 
motivational relief to fear motivation 
in situations other than the achieve- 
mentsituation. Aswe have indicated, 
the achievement situation is rather 
exceptional in that the conceptualiza- 
tion of achievement related motiva- 
tion involves the assumption of de- 
pendencies between incentive values 
and expectations. The analysis for 
the more usual case, where incentive 
values and expectations are assumed 
to be independent, would be simpler. 
We would predict that, for this more 
usual case, motivational disappoint- 
ment accompanying reduction of an 
expectation of reward to a low value 
should tend to be positively related to 
the strength of initial hope motiva- 
tion. Similarly, motivational relief 
accompanying reduction of an expec- 
tation of punishment to a low value 
should tend to be positively related to 
strength of initial fear motivation. 
Unlike the hypotheses presented for 
the achievement situation, neither of 
these two predictions is qualified by an 
assumption about the strength of 
the initial expectation of reward or 
punishment. 

But what of the relationships be- 
tween hope motivation and fear mo- 
tivation, or between motivational re- 
lief and motivational disappointment? 
It seems to the writer that Mowrer’s 
revised two-factor theory would lead 
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to the prediction of positive interrela- 
tionships between measures of the 
four emotions of fear, hope, relief, and 
disappointment on the basis of in- 
tensity of fear. When fear is strong, 
relief (or fear reduction) should be 
strong, hope (or anticipated fear 
reduction) should be strong, and 
disappointment (or fear induction) 
should be strong. The strength 
of hope, relief, and disappointment 
should tend to decline as fear becomes 
less intense. However, an analysis 
based on the motive-expectancy-value 
model need not lead to this prediction. 
In the first place there is no necessary 
assumption in the motive-expectancy- 
value model that reward is equivalent 
to fear reduction. In fact, the model 
is more hedonistic in its orientation. 
Secondly, even though it may be pos- 
sible to devise reward and punishment 
situations which determine relatively 
constant expectations and incentive 
values among the subjects, hope 
motivation and fear motivation would 
still be influenced by the particular 
motives which are aroused. These 
motives are not necessarily positively 
correlated. In fact, in the theory of 
achievement motivation (Atkinson, 
1957), it is assumed that the motive to 
achieve success (M,) and the motive 
to avoid failure (Mas) are independent 
dispositions of the personality. Sey- 
eral studies (Atkinson, 1958; Feather, 
1963), in which strength of M, is 
inferred from analysis of TAT proto- 
cols and strength of Mas is inferred 


5 The generally positive intercorrelations in 
Table 5 are not inconsistent with this predic- 
tion. An important exception is the absence 
of a significant positive correlation between 
ratings of anxiety about failure and achieve- 
ment concern. Furthermore, ratings of dis- 
appointment are the only measures to show a 
significant positive correlation with initial 
estimates of probability of success. These 
latter results are consistent with the deri- 
vation from the theory of achievement 
motivation, 
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from scores on the Test Anxiety 
Questionnaire (Mandler & Sarason, 
1952), provide evidence which is con- 
sistent with this assumption. Thus, 
given expectations and incentive val- 
ues which are relatively uniform in 
strength among the subjects, predic- 
tion of the relationship between meas- 
ures of hope motivation and measures 
of fear motivation would depend on 
assumptions about the relationship of 
the underlying motives. In the pres- 
ent conceptualization these motives 
are taken as relatively stable per- 
sonality dispositions and need not be 
positively correlated. Nor are they 
considered to involve changes in the 
level of fear. 

The writer believes that the con- 
cepts of fear motivation, hope motiva- 
tion, motivational relief, and motiva- 
tional disappointment, which have 
been developed in this paper, provide 
an alternative to Mowrer’s con- 
ceptualization permitting differential, 
testable predictions. 
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A theory based on Helson’s adaptation level formulation was presented 
that was capable of integrating the data on the problem of transposition 
of intermediate size. These data were analyzed and new research re- 
ported that revealed several phenomena of importance for any theory of 
intermediate size discrimination. The modes of response either repli- 
cated or were transposition, systematic preference for the 
absolute stimulus, neither absolute nor relational choice, equal prefer- 
ence for 2 test stimuli, and random response. The ratio theory was 
able to deal with the experimental results and to predict the points of 
transition from one mode of response to another, The limitations and 
the implications of the theory were discussed. 


Tue INTERMEDIATE SIZE PROBLEM 


Transposition has been a central 
concern for theories designed to ex- 
plain the essential nature of the 
stimulus in discrimination learning. 
However, evidence opposed to all 
theories that have attempted to ac- 
count for transposition has been re- 
vealed by research on the intermediate 
size problem. The first study that 
employed three stimuli differing in 
size and simultaneously presented was 
that of Spence (1942) which showed 
that response in test was predomi- 
nantly to the specific stimulus on which 
the subjects (chimpanzees) had origi- 
nally been trained. Spence considered 
these data to be strong evidence 
against both a relational (Kohler, 
1955) and a configurational (Gulliksen 
& Wolfle, 1938) theory and extended 
his earlier formulation (Spence, 1937) 
to account for the intermediate size 


1 This paper is based on a dissertation sub- 
mitted to the Graduate Faculty of the New 
School for Social Research in partial fulfill 
ment of the requirements for the PhD degree. 
The writer wishes to express his gratitude to 
the chairman of his dissertation committee, 
Jerome Wodinsky, for his interest and in- 
valuable advice during all stages of the 
development of this study. 

2 Now at Wellesley College. 


problem. However, Spence’s absolute 
theory of intermediate size discrimina- 
tion was challenged by the experiment 
of Gonzalez, Gentry, and Bitterman 
(1954). These authors clearly re- 
vealed that the occurrence of trans- 
position of intermediate size obtained 
in their study with chimpanzees was 
evidence against Spence’s theory just 
as the mere failure of transposition 
was obviously evidence against a 
relational approach. The results of 
both Spence and Gonzalez et al. were 
replicated in a series of experiments 
with monkeys by Gentry, Overall, and 
Brown (1959) and Brown, Overall, 
and Gentry (1959). 

As a solution to the problem of the 
breakdown of transposition, Steven- 
son and Bitterman (1955) stated: “In 
the course of a relational solution S 
may learn something about the ab- 
solute properties of the training 
situation, and transfer of the rela- 
tional solution may be impaired to the 
extent that these absolute properties 
are changed [p. 274].’’ They experi- 
mented with young children incapable 
of verbalizing the concept of inter- 
mediate size and found transposition 
in a “near” but not in a “far” test 
(the same phenomenon revealed by 
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two-choice transposition studies). 
Gonzalez and Ross (1958) concluded 
that “the range of stimulus-equiva- 
lence seems to be a factor which limits 
the range of transposition [p. 746]." 

This approach of contemporary 
relational theorists may be descrip- 
tively accurate but does not come to 
grips with the empirical 
of stimulus generalization in the inter- 
mediate size problem that has in- 
dicated that the basis of equivalence 
cannot be considered to be com- 
mon relational Klüver 
(1933) defined stimuli eliciting the 
same response as equivalent, and 
those failing to result in the same 
response as nonequivalent. In these 
terms, any stimulus in the test set to 
which the response is made signifi- 
cantly more frequently than to the 
other test set members must be con- 
sidered equivalent to the training 
stimulus. Only if response is divided 
randomly among all three test stimuli 
can explanation be made in terms of 
nonequivalence. Since the data have 
revealed nonintermediate as well as 
intermediate choice, no theory can 
claim adequacy by dealing with one 
aspect of the problem, and failing to 
specifically explain an equally signifi- 
cant mode of response. 

James (1953), utilizing Helson’s 
adaptation level (AL) formulation as 
the basis for analysis of two-choice 
transposition studies, confessed that 
his theory was unable to account for 
the difficulty of learning the inter- 
mediate size problem. Asa matter of 
fact, it is difficult to understand how 
James would explain transfer in the 
intermediate size problem given the 
fact of learning. In extending James’ 
theory to the intermediate size prob- 
lem, it might be assumed that the 
subject learns to approach the stimu- 
lus that is at the AL and to avoid 
stimuli above and below that level. 


s? 


Therefore, transposition could occur 
only when the intermediate stimulus 
of the test set is precisely at the 
neutral point and systematic response 
to a nonintermediate test stimulus 
could occur only if that stimulus were 
to be exactly at the AL. As soon as 
no test set member was precisely at 
that point, response would be random, 
While such an interpretation could be 
made, the data have indicated too 
wide a range of equivalence in the 
intermediate size problem for the 
position to be tenable. If James were 
to be interpreted as implying that the 
intermediate stimulus is on one side 
or the other of the AL, there would be 
no way (in his own terms) for the 
organism to discriminate that stimulus 
from its partner falling on the same 
side of the neutral point. 

Riley (1958) asserted that the 
stimulus in a brightness discrimina- 
tion is defined in terms of the ratio 
between the training stimulus and the 
background, and that a change in this 
ratio is responsible for the distance 
effect. The intermediate size problem 
would appear to be difficult to handle 
within the context of Riley's approach 
primarily because it is not clear as to 
how the basic assumption of this 
theory might be formulated to handle 
size discrimination. Many investiga- 
tors have demonstrated the independ- 
ence of the perception of size from the 
background, and a study that showed 
some effect of a surrounding frame on 
a luminous line (Rock & Ebenholtz, 
1959) indicated that the effect was not 
overly impressive. Without a back- 
ground term, the Riley theory is re- 
duced to either a traditional absolute 
or relational approach. 


Tue Ratio THEORY 


The ratio theory of intermediate 
size discrimination is based essentially 
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on the underlying postulates of Hel- 
son's adaptation level theory. These 
postulates are presented with appro- 
priate references for those that are 
taken over from the traditional adap- 
tation level conception. 

1. All behavior centers about the 
adaptation level of the organism 
(Helson, 1947, 1948, 1959). 

2. The adaptation level is depend- 
ent upon interaction) of all stimuli 
confronting the organism and past 
stimulation. Itis approximated as a 
weighted log mean of all stimuli 
affecting the organism (Helson, 1947, 
1948, 1959), 

3. All dimensions of present and 
residual stimuli are related to the 
adaptation level. Not all need neces- 
sarily be considered in determination 
of this level (Helson, 1947, 1959). 

4. Fixed stimuli do not have con- 
stant effects on the organism. Prop- 
erties of stimuli depend upon the 
relations of the stimuli to the prevail- 
ing adaptation level (Helson, 1938, 
1947, 1948, 1959). 

5. The basis of stimulus equivalence 
is perceived similarity. In order to 
be perceived as similar, stimuli must 
lie on the same psychological con- 
tinuum. 

The ratio theory consists of certain 
inferences from the five basic postu- 
lates. These hypotheses are presented 
as mathematical statements where 
appropriate. 

1. The adaptation level (AL) is the 
weighted log mean of series stimuli 
and residual AL. Let X; represent 
the series stimuli, R the residual AL, 
y the constant applied to the log mean 
of the series stimuli, x the constant 
applied to the residual AL, y+ x 
= 1.00, and n the number of series 
stimuli. The AL formula is: 


2 log X; 
Log AL =» (28%) + x log R 
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(a) The subject enters training with a 
residual AL which exerts an extra- 
experimental effect upon the AL. 
With each successive trial, R becomes 
increasingly a function of the series 
stimuli since R is primarily dependent 
upon immediate past stimulation. 
The effect of irrelevant prior stimula- 
tion is negligible by the completion of 
training so that: 
b y $ 
Log training AL = SoA 
(b) The AL on the first test trial 
immediately after the completion of 
training is expressed by the formula: 


> log *) 


n 


log test AL = y ( 


+ x log training AL 


2. Each stimulus in the series is 
defined as the ratio of the stimulus 
area to the AL. 

3. The subject learns in training to 
respond to the ratio of the positive 
stimulus to the AL. 

4. (a) Probability of response on 
the first test trial is a function of the 
degree of similarity between the 
individual test stimulus ratios and the 
positive training ratio whenever all 
of the test stimulus ratios are not 
either larger or smaller than the 
positive training ratio. (b) When all 
of the test stimulus ratios are either 
larger or smaller than the positive 
training ratio, response will be divided 
randomly among the test stimuli. 

Helson (1947, 1948, 1959) stated 
that a basic factor in adaptive be- 
havior is the tendency to order stimuli 
by means of graded dichotomies 
(principle of bipolarity of response), 
and conceived of the AL as approxi- 
mated by the level of stimulation 
considered to be neutral or indifferent. 
James (1953) employed this concept 
in his formulation of an explanation 
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for transposition and the distance 
effect in two-choice situations but his 
approach is inadequate to explain the 
intermediate size problem primarily 
because stimulus aspects are dichot- 
omized. In the current formulation 
the relationship of the individual 
stimulus to the AL has been inter- 
preted as the ratio of the stimulus 
dimensions to this level so as to yield 
the necessary specific values. It has 
also seemed reasonable in the light of 
the known independence of size per- 
ception from the background to 
eliminate this factor from the 
AL formula when dealing with this 
dimension. 

Consideration has been restricted 
to the initial test trial since repeated 
test trials introduce the following 
complicating variables: (a) a trial-to- 
trial shift in the residual reducing the 
effect of the training set with the 
concomitant change in the AL (Hy- 
pothesis 1a) ; (b) a trial-to-trial change 
in stimulus ratios (Hypothesis 2); 
(c) the trial-to-trial occurrence of new 
learning based on either the current 
ratio of the reinforced test stimulus 
(Hypothesis 3) or the unknown effect 
of nonreinforcement in test; (d) the 
problem of determining the most 
similar test ratio to the prevailing 
learned positive ratio (Hypothesis 4). 
Until these processes of interaction 
can be quantitatively evaluated, it 
has seemed necessary to restrict 
consideration to the first test trial. 

The current formulation contains 
both perceptual and learning con- 
structs. There are at least three not 
mutually exclusive possibilities of 
what is learned in the training 
situation: the AL, the two negative 
training ratios (the range of the train- 
ing set), and the positive stimulus 
ratio (the reinforced stimulus). The 
typical transposition study involves 
discriminative training followed by a 
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test situation in which the response is 
used as an index of the essential nature 
of the positive stimulus. Since this 
is also the interest of, and the situation 
referred to by, the ratio theory, refer- 
ence has been made only to the 
specific nature of the learned stimulus 
(Hypothesis 3). 

The ratio theory states that re- 
sponse in test is solely a function of 
the similarity of the test ratios to the 
positive training ratio and, therefore, 
is in contrast to Spence’s (1942) con- 
ception which considers inhibition of 
response to negative stimuli to be a 
basic factor in the determination of 
choice behavior. Transfer is pri- 
marily on a positive basis: the subject 
learns to respond to a given percept, 
and will respond to the object per- 
ceived as most similar to the original 
in subsequent test. While learning 
about negative stimuli may and un- 
doubtedly does occur, the reanalysis 
of the data in the literature and the 
results of the author's experiments 
do not require consideration of the 
negative ratios in order to deduce the 
test response. 

Independent of the various factors 
that may be learnable and therefore 
can be the basis for several different 
kinds of analyses of discrimination 
studies, the present concern is with 
those variables that operate to con- 
stitute the stimulus as a percept. 
Included in the definition of the stimu- 
lus are two terms that indicate the 
relative effects of current stimulation 
and past perceptual experience. It is 
unclear, either within Helson’s formu- 
lation or the present theory, whether 
the residual represents perceptual 
learning or some central or peripheral 
adaptation phenomenon. Although it 
may be difficult to conceive of an 
adaptation effect for size operating in 
a manner similar to that for color, 
brightness, or pressure, one is re- 
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minded that a striking aspect of 
figural aftereffects is the alteration of 
perceived size. The problem of the 
nature of the residual effect must 
remain open at this time. 

Since James’ analysis of two-choice 
transposition has appeared to be 
generally accurate as well as poten- 
tially compatible with the ratio theory, 
there is no necessity for any detailed 
consideration of this related problem. 
James’ use of the shift of level prin- 
ciple has accounted for the basic fact 
revealed by these studies: that is, 
transposition with sets similar to that 
used in training and the failure of 
transposition with a test situation 
considerably different from training. 
The primary reservation with regards 
to this conception is that it has not 
been formulated in a way that would 
permit explanation of the gradient of 
relational response revealed by Alberts 
and Ehrenfreund (1951) and Ehren- 
freund (1952). However, it would 
seem reasonable that the two-choice 
situation is simpler to learn than the 
three-stimulus design principally be- 
cause it does not require response to a 
specific stimulus ratio but rather can 
be dichotomized into two broad 
categories of stimuli above and below 
the AL. The fact that actual prefer- 
ence for a nonrelationally defined 
stimulus has never been unambigu- 
ously reported in two-choice trans- 
position is support for James's con- 
tention that response is either to the 
“relational” stimulus or is divided 
randomly between the two. This 
would also suggest that James’ ap- 
proach might be reformulated in 
terms of positive learning alone: the 
subject learns to approach a stimulus 
that is on one side of the AL. Logic- 
ally, if the subject has learned not to 
approach a stimulus having a certain 
relationship to the AL, he should do 
nothing in test when confronted with 
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two stimuli having this same relation- 
ship. The insistence on negative 
learning in this case undermines the 
consistency of the theory while the 
restriction of consideration to positive 
aspects can explain transfer as well as 
random response in the event of 
neither or both stimuli lying on the 
“approach side” of the AL. 

Since the data suggest that two 
different training sets, in which the 
same relationally defined stimulus is 
reinforced, are sufficient to establish a 
relational learning set (Gonzalez & 
Ross, 1958; Hunter, 1952; Johnson & 
Zara, 1960), such designs are not 
central to the problem of the essential 
nature of the stimulus in a simple 
discrimination situation such as that 
in which a single training set is em- 
ployed. Such conditions might be 
considered as follows. When an 
organism is given a single discrimina- 
tion task, the response is to the stimu- 
lus as perceived. However, with two 
sets to learn concurrently in which 
vastly different stimuli are correct 
so that learning cannot be on the basis 
of common appearance, learning is 
instigated employing concepts rather 
than percepts and results in trans- 
position regardless of the absolute 
dimensions of the test stimuli. 


EXPERIMENTAL DATA 


The adequacy of the ratio theory 
relative to that of other conceptions 
can be examined provisionally by 
reference to the published experi- 
ments on the intermediate size prob- 
lem. For analysis of any experiment 
to be possible, data should be pre- 
sented in terms of first test trial 
response: however, only one study, 
Stevenson and Bitterman (1955), 
meets this criterion. Fortunately, the 
results of experiments by Spence 
(1942), Gonzalez et al. (1954), Gentry 
et al. (1959), and Brown et al. (1959) 
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TABLE 1 
DATA FROM SPENCE (1942) 
Training set | Test set | Test ratios Predicted Moan upoan fa Aat 
- i — - —ļ - — — nant — -~ — — 
8's ff ake j es s “i: Lendl eB aed ae | I | L — s | talie 
100 | 160 | 256 | 160 | 256 | 409 «83 | 1.35 | 2.12 | 160 (S) | 12.67 | 2.33 | 0.00 
160 | 256 | 409 | 100 | 160 | 256 AT -76 | 1.21 | 256 (L) «33 | 1.33 | 10.00 


Note,—Stimulus sizes in square centimeters. Positive training ratio = 1,00. 


which do not report first test trial 
data indicate little variability of re- 
sponse and are sufficiently clear-cut 
to warrant an interpretation. Two 
studies of intermediate size transposi- 
tion in addition to that of Gonzalez 
and Ross (1958) are not analyzed: 
Rudel (1957) and Reese (1962). 
Rudel reported the data in terms of 
“predominant choices,” the number 
of subjects choosing the intermediate 
stimulus in 5 or more test trials out 
of 10; while Reese presented the 
number of intermediate choices in 6 
test trials but only reinforced the 
intermediate stimulus during the test 
session. There is no basis for discus- 
sion of the results of these experiments 
in terms of the ratio theory. 

The value of y (since x + y = 1.00, 
the constant values are given in terms 
of y alone) may range from .01-.50 
to fit the results of Spence’s (1942) 
experiment with chimpanzees and 
from .10-.99 for the Gonzalez et al. 
(1954) study with the same species. 


In the experiments of Gentry et al. 
(1959) and Brown et al. (1959) con- 
ducted with rhesus monkeys, the 
value of y can range from .10-.40 
while it must be between .60-.80 
to explain the Stevenson and Bitter- 
man (1955) study with young chil- 
dren. The following specific values 
were established for the purpose of 
analysis: Gentry et al. and Brown et 
al. = .30, Spence and Gonzalez et al. 
=.40, Stevenson and Bitterman =.60. 
The predictions and results (except 
for Stevenson and Bitterman) are 
given in Tables 1-4. The positive 
training ratio in all of these studies 
was 1.00. It may be of interest that 
the maximal possible level of y con- 
sistently increases with higher phyletic 
levels despite the fact that subsequent 
experimentation (Zeiler, 1962) sug- 
gested that the differential values 
might be explained on the basis of the 
specific training sets employed in each 
study. 

The only excluded data are from 


TABLE 2 
DATA FROM GONZALEZ, GENTRY, AND BITTERMAN (1954) 


i i Mean response in 
Final training set Test set Test ratios ped Oo teat TIN 
nant 
choice 
sS I L S I L S I L S I L 


11.9 | 15.7 | 20.8 | 137 | 18.1 | 23.9 | 83 


.92 | 1.21 | 13.7 (I) | .25 | 9.5 | 2.25 
1.09 | 1.44 | 18.1 (I) | 4.5 7.5 | 0.00 


Note.—Stimulus sizes in square inches, Positive training ratio = 1,00. 
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TABLE 3 
DATA FROM GENTRY, OVERALL, AND BROWN (1959) AND BROWN, OVERALL, 
AND GENTRY (1959) 
Training set Test set Test ratios Beige param Eon i 
nant 
choice 
sS I L S I L s I L s I É 
10.4 | 13.7 | 18.1 | -69 .91 | 1.20 | 13.7 (I) .88 | 8.88 | 2.25 
9 | 15.7 | 20.8 13.7 | 18.1 | 23.9 | .84 | 1.10 | 1.46 | 18.1 (I) | 3.00 | 8.88 .13 
m x $ 9.0 | 11.9 | 15.7 | .62 82 | 1.08 | 15.7 (L)} 0.00} .33 | 11.67 
15.7 | 20.8 | 27.5 | .92 | 1.22 | 1.61 | 15.7 (S) | 11.78] .11 ll 
Note.—Two conditions omitted: see text. Stimulus sizes in square inches. Positive training ratio = 1.00. 


two conditions of the Brown et al. 
experiment. With training on a set 
composed of stimuli 11.9, 15.7, and 
20.8 square inches and test with a set 
composed of stimuli 10.4, 13.7, and 
18.1 square inches, eight of nine 
subjects responded to the largest 
stimulus on most test trials. Such a 
response is clearly opposed to predic- 
tions made by the ratio theory, by 
absolute theory, and by relational 
theory as well as to the position 
maintained by the experimenters. It 
is also put in question by the findings 
of the same investigators (Gentry et 
al. ran the same condition and found 
clear preference for the intermediate 
stimulus) and the data of Gonzalez 
et al. For these reasons, there would 
seem to be some justification in ex- 


cluding these data from consideration 
until the finding is found to be a stable 
one by future experimentation. Al- 
though the results for the test set 
composed of stimuli 13.7, 18.1, and 
23.9 square inches supported the ratio 
theory, these have also been excluded 
from Table 3 where the predictions 
and results are presented. 

Stevenson and Bitterman presented 
their data in terms of the frequency of 
choice of the intermediate stimulus on 
the first test trial rather than in terms 
of the frequency of response to each 
of the three stimuli. Six of the twelve 
subjects in Group I (a and b) and one 
subject in Group II (a and b) chose 
the intermediate stimulus on the first 
test trial while no subject in either 
group selected the intermediate stimu- 


TABLE 4 
TEST STIMULUS Ratios FROM STEVENSON AND BITTERMAN (1955) 
Test 1 Test 2 
Training set 
Group Test set Test ratios Test set Test ratios 
Ss I L S I L s I L s I ERSS Eo yE 
Ia | 4.0} 5.6] 7.8] 5.6] 7.8|11.0| .82|1.14 | 1.60 | 21.5 | 30.1 | 42.2 |1.29 |1.82 | 2.54 
Ib | 21.5 | 30.1 | 42.2 | 15.4 | 21.5 | 30.1] .62| .87| 1.22] 4.0] 5.6] 7.8] .40] .55| .77 
Ila | 4.0] 5.6] 7.8 | 21.5 | 30.1 | 42.2 | 1.40 | 1.95 | 2.74 | 5.6] 7.8|11.0| .55 | .77 | 1.08 
IIb | 21.5} 30.1 | 42.2 | 4.0] 5.6] 7.8) .36| .51| .71]| 15.4} 21.5| 30.1] .94 |1.31 | 1.84 


Note.—Stimulus sizes in square inches. 
Test 2 is Test 1 


AL. 


Positive training ratio = 1.00, Residual AL: Test 1 is Training AL; 
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lus on the first trial of the second test. 
The ratio theory predicted inter- 
mediate choice only for Group I 
(a and b) in Test 1 and random 
response for all other conditions, 
except in Test 2 for Group II. In this 
test predicted response was to the 
large stimulus for Group Ila and to 
the small stimulus for Group IIb. 
All predictions are tentative for Test 2 
for all groups since six rewarded trials 
on Test 1 intervened between the 
original training set and Test 2 mak- 
ing it arbitrary to assume that what 
is transferred to this second test is the 
positive training ratio of the original 
training set. Insufficient data were 
provided for evaluation of most of the 
predictions; however, in the only cases 
where intermediate response was pre- 
dicted it did occur. 

Zeiler (1962) conducted a series of 
experiments designed specifically to 
test the adequacy of the ratio theory. 
The value of y was assumed to be .60 
for young children on the basis of the 
adequacy of this value to deduce the 
results of Stevenson and Bitterman 
(1955) for this category of subjects. 
Eight stimuli were prepared, the 
smallest (1) having an area of 4.00 
square inches and each succeeding 
stimulus increasing in area by a factor 
of 1.4. Eight stimuli taken three at a 
time yield 56 sets of 3,136 possible 
combinations of training and test sets 
(56 X 56) for which the ratio theory 
makes a specific prediction as to re- 
sponse on the first test trial. These 
predictions are produced with equal 
facility whether each stimulus set is 
composed of stimuli equally different 
in size from each other (symmetrical) 
or is composed of two similar, al- 
though not identical, stimuli and one 
element very different from the other 
two (asymmetrical). 

The same procedure was used in all 
of these experiments which were con- 
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ducted with 4- and 5-year-old children 
from day care centers in New York 
City. Training was to a criterion of 
five successive first choices of the 
intermediate stimulus: the first five 
trials used a correction method and 
all subsequent trials employed a non- 
correction technique. The intertrial 
interval was the time required to 
position the stimulus blocks (cut from 
-25-inch unpainted Masonite) and to 
place the goal object, a small red 
plastic chip, under the intermediate- 
sized component. The subject was 
congratulated when the goal object 
was found. Immediately upon the 
attainment of criterion, the test set 
blocks were substituted for those used 
in training and subject's first choice 
was recorded. The positioning and 
baiting of blocks for each trial (which 
of the six blocks positions used on each 
trial was predetermined by a table of 
random numbers) and the substitu- 
tion of test for training set were ac- 
complished behind a screen separating 
the experimenter from the subject. 
Following the single test trial, the 
subject's ability to conceptualize the 
relationship of intermediate size as 
indexed by either verbalization of the 
concept or consistent identification 
of the intermediate-sized block was 
determined. Each subject was used 
on one training-test combination. 
Two experiments dealt with the 
predictions of the ratio theory with 
combinations employing symmetrical 
training and test sets. The first of 
these held training constant. with 
Sets 1-2-3 or 3-4-5 and systematically 
manipulated the distances of the test 
set from the training set. Table 5 
indicates the training-test combina- 
tions, the predictions, and the results 
and reveals that the points of occur- 
rence of transposition, nonrelational 
choice, and random response were 
correctly predicted in 9 of the 10 
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cases. At this point in the develop- 
ment of the theory, the only prediction 
made was that the largest number of 
choices would be to the predicted 
stimulus. Results were considered 
random if responses to each stimulus 
were approximately equal and non- 
monotonic. Although the sole diver- 
gence from prediction was with- a 
combination for which random re- 
sponse was forecast, the other three 
such cases supported Hypothesis 4b. 
A separate study replicated the same 
phenomenon: training was with 6-7-8 
and test with 1-2-3 and the results 
revealed that four subjects responded 
to Stimulus 1, three to Stimulus 2, and 
five to Stimulus 3. Why there seemed 
to be a consistent tendency for the 
intermediate to be the least preferred 
of the three test stimuli is not cur- 
rently understood. 

The second experiment on sym- 
metrical stimulation employed seven- 
teen conditions and varied the area 
factor separating the stimuli of both 
the training and test sets. The 
combinations, predictions, and re- 
sults appear in Table 6 and reveal 
correct forecast in 11 of the 17 
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cases. The six erroneous predictions 
all occurred due to the unexpected 
breakdown of transposition support- 
ing Spence’s (1942) contention that 
as the distance between training 
set elements increases, transposition 
should decrease. In terms of the ratio 
theory, these data can be explained on 
the basis that as the factor separating 
the training set stimuli increases, the 
value of y decreases. In the first 
experiment where the area factor 
separating the training set stimuli was 
1.4, a y value of .60 was adequate 
while in this study, an area factor of 
1.95 (e.g., Set 2-4-6) required that y 
be .40-.50 and an area factor of 2.75 
(e.g., Set 1-4-7) required that the y 
value be .20-.50. This would imply 
that the more distinctive are the 
stimuli of the training set in relation 
to each other, the greater relative 
effect they have in the establishment 
of the test adaptation level. 

The ratio theory predicts that with 
the test set held constant appropriate 
selection of training sets should result 
in response to either the large, inter- 
mediate, or small test stimulus al- 
though in each case training is to the 


TABLE 5 
STIMULUS SETS, PREDICTED RESPONSE, AND FREQUENCY OF RESPONSE 
(N per group = 12) 


Labs Training set Test set Test ratios renee pirat ceat raag 
nant 

s I cs s I L s I i TaS I L 

1 E 2 MG EE NERA .82 | 1.14 | 1.60 I EAEE 

2 ANT EQUUS alte diy een 93 | 1.31 | 1.83 S 6| 4) 2 
3 1 | 2 | 3 | 4 | 5 | 6 | 1.07 | 1.50] 2.09 random. | 8 | 2] 2 
4 1 {2} 3 1 5 | 6 | 7 | 1:22) 4.71 | 2.39 | random | 5 |,3 | 4 
5 T 2 | 3 | 6 | 7 | 8 | 1.40) 1.95 | 2.74] random] 6] 2] 4 
6 BAUS E SMe) gH Sie a L ELO. 
7 Siga AN tego 0871/11/23 I o| 9] 3 
8 3 jot 7S: EERS .82 | 1.15 | 1.60 I Delle AONO, 
9 fae Le es ES! CON |r, .93 | 1.30 | 1.82 S 11 0 1 
10 30] 4} eS | tO ete S EEO |) 7.50" | 2340") randoms | a | 221" "6. 


Note.—Positive training ratio = 1.00, 
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TABLE 6 
STIMULUS Sets, PREDICTED RESPONSE, AND FREQUENCY OF RESPONSE 
(N per group = 12) 
Training set Test set Test ratios Predicted Firat test trial 
Group è es aa A as predominant 
* ti 3 ~| response a = 
4 s i, L s I L | s ay L | | s I L 
pihud ge [st | eo en iss’ |aas | aaa | ot 31 9] 0 
2 1 < 3 5 7 -66 1.30 | 2.55 1 10 2 0 
3 1 3 5 4 6 8 .76 1.49 | 2.92 S 10 1 1 
4 2|4 6 | | 2 3 55 -76 | 1.06 L 0 1 11 
5 2 | 4 6 2 3 4 63 | .87 | 1.23 I LOO 5 7 
6 2 4 6 4 5 ó .82 | 1.15 | 1.60 I 6 6 0 
7 2 4 6 5 6 7 .93 | 1.30 | 1.82 S 12 0 0 
8 3 5 7 2 4 6 44 .87 | 1.71 I 0} 10 2 
9 3 5 7 4 6 8 58 1.14 | 2.24 I 1 11 0 
10 4 6 8 1 3 5 34 .67 | 1.31 L 0 0 | 12 
11 4 6 8 2 4 6 39 .76 | 149 I 0 4 8 
12 1 4 7 3 5 7 58 | 1.15 | 2.25 I 2) 10 0 
13 1 a 7 4 6 8 .67 1.30 | 2.55 I 11 1 0 
14 2 S 8 2 3 4 54 .76 | 1.07 k 1 2 9 
15 2 5 8 3 4 5 .62 87 | 1.22 I 1 7 4 
16 2 5 8 5 6 7 -82 | 1.14 | 1.60 1 6 6 0 
17 2 5 8 6 7 8 -93 | 1.31 | 1.83 S 12 0 0 


Note.—Positive training ratio = 1.00. 


intermediate member of the initial set. 
Twelve test conditions were randomly 
selected from those that would yield 
this range of prediction and were 
matched with appropriate training 
sets. Sets were both symmetrical and 
asymmetrical (Table 7). Each of the 
36 combinations was administered to 
a single subject: 33 of the 36 subjects 
responded as predicted with one 
erroneous prediction occurring for the 
“Large” group and two for the 
“Intermediate” group. 

Another prediction of the ratio 
theory is that when two of the three 
test stimuli have ratios equally sim- 
ilar to the positive training ratio 
there should be equal choice of these 
two elements. Two combinations for 
which this prediction is made are 
those of training with either 1-3-8 
or 2-3-7 and testing with 2-4-7. In 
both cases, the positive training ratio 
is .71 and the test stimuli have the 


following ratios: 2 = .48, 4 = .94, 
7 = 2.57. Since both Stimuli 2 and 
4 differ from the positive training 
ratio by .23 ratio units and Stimulus 7 
differs by 1.86 units, it is expected 
that responses would be equally 
divided between Stimuli 2 and 4. 
Since the hypothesis of no difference 
was logically impossible to prove, four 
other combinations were run with the 
same test set in which one of the two 
stimuli was the predicted predominant 
choice. Predictions and data are 
given in Table 8. The distribution of 
responses for the “equal choice” 
groups (Groups 1 and 2) was signifi- 
cantly different from those of the 
other four groups indicating that re- 
sponse was other than that deter- 
mined by a single equivalent test set 
member. This finding of equal choice 
(or dual equivalence) of two of the 
three test stimuli had not been re- 
ported previously in the literature. 


526 
TABLE 7 
TRAINING AND TEST SETS 
Training set 
Test set 
Predicted predominant response 
s Inter- 

S I l Small mediate Large 
niad 4) | e eS 
2 | ¢-| 5! E ees 
3) 6) 7.) 1-2-6) 678 7-8 
3 | 4 |. 5°] 143-6),| 045-84) 4-6-7 
1 | S$) |56>)-"1-2-90| e368) I 1-6-7 
4|5|7] 2-3-6 | 5-68 | 47-8 
4) 51.6) 348 | 3-445°|) 527-8 
3° (sell 56h)" e Ia E 
Jul sabi Se tao=7ie eset 6-728 
2 | aah 40 eias | 2-4-8? | 1S-6-8 
34 4° |96:] 2-3-8" | 3-7 || 327-8 
SOI 12-4 ies: | 227-8 


The data also revealed the accuracy 
of the ratio theory in predicting both 
transposition (Groups 3 and 5) and 
nonrelational choice (Groups 4 and 
6) with asymmetrical training and 
test sets. 

Besides the phenomenon of dual 
equivalence, the ratio theory moti- 
vated research that resulted in the 
discovery of additional new effects 
(Zeiler, 1963) that were direct evi- 
dence for the adequacy of this con- 
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ception as opposed to alternative 
theoretical views. The first of these 
was the significant preference for a 
previously negative stimulus when the 
test set differed from the training set 
only in that one of the negative 
stimuli was replaced by a new element. 
An example of this situation is as 
follows: Train 1-2-8, Stimulus 2 
positive, Test 1-2-3; positive training 
ratio = .57, test ratios: Stimulus 1 
= .57, Stimulus 2 = .80, Stimulus 
3=1.11; predicted first test trial 
response is to Stimulus 1. The 20 of 
the 3,136 total combinations that 


produced this type of prediction are 


presented in Table 9. Three subjects — 
were run on each combination and 
46 of the 60 subjects responded 
as predicted. While 3 of the 20 
combinations when analyzed in terms 
of majority choice did not support 
prediction and a subsequent study 
that tested 2 of the 20 combina- 
tions with a larger N per training- 
test pair showed that in one of these 
cases (Train 1-2-8, Test 1-2-5) there 
was significant preference for Stimulus 
2 when Stimulus 1 was predicted and 
in another case (Train 1-7-8, Test 
6-7-8) response was divided equally 
between Stimuli 7 and 8 when 
Stimulus 8 was the predicted choice, 


TABLE 8 


STIMULUS Sets, PREDICTED RESPONSE, AND FREQUENCY OF RESPONSE 


(N per group = 20) 


Traini i i i f 
ig raining set Positive Test ratios A Predicted | First test trial response 

S I L 2 4 7 Saige 2 4 7 
1 1 3 8 71 48 | .94 | 2.57] 2and4 10 10 0 
2 2 3 7 71 48 | 94 | 2.57 | 2and4] 9 | 11 0 
3 1 4 8 89 46 | .89 | 2.45 4 1 | 19 0 
4 1 2 8 57 -50 98 | 2.69 2 16 4 0 
5 2 3 6 80 50 98 | 2.69 4 3 15 2 
6 2 3 8 63 46 89 | 2.45 2 14 6 0 
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TABLE 9 
STIMULUS Sets AND PREDICTED RESPONSE 


Training set Test set 
a Positive 
ratio 

S I L s I 
1 2 6 72 1 2 
1 2 7 64 1 2 
1 2 7 64 1 2 
1 2 8 .57 1 2 
1 2 8 57 1 2 
1 2 8 57 1 2 
2 3 7 71 2 3 
2 3 8 -63 2 3 
2 3 8 63 2 3 
3 4 8 71 3 4 
1 5 6 1.40 4 5 
1 6 7 1.56 4 6 
1 6 7 1.56 5 6 
1 7 8 1.75 4 7 
1 7 8 1.75 5 7 
1 7 8 1.75 6 7 
2 6 7 1.40 5 6 
2 7 8 1.57 5 7 
2 7 8 1.57 6 7 
3 7 8 1.40 6 7 


the evidence indicated by the majority 
of the data in the initial study sug- 
gested that the obtained effect was 
a real one. 

The ratio theory also predicted 
increasing response to Stimulus 1 as 
the test set was shifted from 1-2-6 to 
1-2-5 to 1-2-4 to 1-2-3 following 
training with Set 1-2-8 (Stimulus 2 
positive) since the ratio of Stimulus 1 
in test became increasingly more 
similar to the positive training ratio 
relative to the other test stimulus 
ratios with each shift. Four separate 
groups were run with each combina- 
tion administered to 30 subjects. The 
accuracy of the forecast can be 
determined by reference to Figure 1 
where the data are presented graph- 
ically. Furthermore, the condition 
of Train 1-2-8, Test 1-2-3 showed 
significant preference for Stimulus 1 
which was a replication of the major 
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effect of the previous study although 
the results for the combination Train 
1-2-8, Test 1-2-4 provided an addi- 
tional condition yielding results op- 
posed to prediction. 
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~ 
~~ 


FREQUENCY OF RESPONSE 
a 


2-6 2-3 


1-2-5 
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Fic. 1. Distribution of responses on first 
test trial to the small and the intermediate 
stimulus for each test set. 
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These predictions and results, that 
is, systematic but neither absolute 
nor relational choice (one case is 
sufficient to establish the effect) as 
well as the shift phenomenon, seem to 
be the most novel aspects of the theory 
and represent discoveries previously 
unreported in the history of the 
transposition problem that are of 
importance for all theories. Because 
of its significance, the first finding has 
been replicated six times with an NV 
of more than 300 children.? Some- 
times the predicted result has been 
obtained, otherwise predominant re- 
sponse to the intermediate or equal 
choice of the predicted and the inter- 
mediate stimulus has occurred. In 
terms of other theories, a shift to 
equal preference for the previously 
negative stimuli is virtually as damag- 
ing as the absolute preference for the 
negative element: there is no basis 
for the expectation that there should 
be any response to that element either 
in terms of absolute or relational 
formulations. It seems that whether 
or not the result predicted by the ratio 
theory occurs may depend upon what 
the subject attends to during the 
original learning. Specifically, it ap- 
pears that when the extreme stimulus 
(e.g., Stimulus 8 in Set 1-2-8) is fully 
attended to, the predicted result will 
be obtained. This suggests that as 
the set comes to be considered as 
consisting of two “important” stimuli 
and one relatively “unimportant” 
component, the net effect of the set 
is different than when it contains 
three equal members. Increasing 
asymmetry may operate to accen- 
tuate this condition so that a tech- 
nique that insures maximal attention 
to all of the training stimuli is 


3 The majority of the replication work was 
conducted by Jane Tulipan and Eleanor 
Tinsley. 
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required. It should be noted that 
Brown (1953) demonstrated the role 
of selective attention in the establish- 
ment of the adaptation level: a 
stimulus must be considered as part 
of the series to exert an effect. If the 
assumption was made that training 
set asymmetry operates on the con- 
stant values so that there is a positive 
relationship between the degree of 
symmetry and y, the value of y 
following training with 1-2-8 (a 
maximally asymmetrical set) might 
be established at .30. This would 
result in the deduction of both the 
supporting and all of the negative 
results. 

Conceptualization of the relation- 
ship of intermediate size appeared to 
have no differential effect on behavior 
or on the accuracy of prediction in any 
of Zeiler’s experiments. The majority 
of the subjects indexed as capable of 
abstraction responded as forecast by 
the ratio theory. This would suggest 
that this variable need not be con- 
sidered as important in the problem 
of intermediate size discrimination 
with children below the age of six. 

As revealed by research up to the 
current time, there are six basic 
phenomena that any theory of the 
stimulus in the intermediate size 
problem must explain. They are: 
(a) relational choice (transposition) ; 
(6) systematic absolute response; 
(c) systematic choice of neither an 
absolute nor relational stimulus; 
(d) equal preference for two of the 
three stimuli; (e) random response; 
(f) the points of transition between 
each of these modes of response. The 
fact that this relatively large degree 
of diversity in the mode of response 
can be obtained in the intermediate 
size problem suggests that this situa- 
tion is a more sensitive test of alter- 
native theories than is the two-choice 
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transposition test. By use of the 
ratio theory definition of the stimulus 
in the intermediate size problem, it 
seems that it is possible to provide a 
unified explanation of these phe- 
nomena. 


SPECIFIC AND GENERAL IMPLICATIONS 


The author’s data have indicated 
the need for some revisions in thinking 
regarding the constants in the adapta- 
tion level formula despite the fact 
that first test trial response was cor- 
rectly predicted in the majority of 
cases. However, the y value of .60 
resulted in some erroneous predictions 
in cases where either training set 
asymmetry was extreme or the area 
factor separating the elements of the 
training set exceeded 1.4. Two pos- 
sible solutions to this problem have 
emerged: 


1. The constants (x and y) are a 
function of certain measurable stimu- 
lus set variables such as symmetry and 
the gap separating the stimuli of the 
training set. This would imply that 
the relative effect of past experience 
on size perception is not fixed but is 
variable depending on the specific 
nature of that past experience. 

2. Attentional factors may require 
differential weighting of the stimulus 
contributions to the adaptation level 
so that the correct formula may be: 


Log AL 
=)(% log Xı+b log X2+e log Xs 
y 3 
+x log R 


where a, b, and c represent the 
differential weights applied to the 
individual series stimuli and Xi, X2, 
and X; represent these stimuli. With 
reference to both of these possible 
solutions, there is as yet no rational 
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method for precise statement of the 
various constants; such determination 
would seem to be best accomplished 
empirically. 

Another significant problem con- 
fronting the ratio theory is the speci- 
fication of the function relating the 
relative differences in test stimuli 
ratios from the positive training ratio 
to probability of response. Hypoth- 
esis 4a of the theory should ultimately 
be expressed in terms of a precise 
mathematical statement. Although 
such an undertaking would be pre- 
mature until the constant values can 
be definitely established, it is realized 
that theoretical adequacy and quanti- 
tative accuracy do eventually require 
such a presentation. 

Since the ratio theory has been 
demonstrated to have the ability to 
deduce and predict more of the exist- 
ing data in the intermediate size 
problem than alternative formula- 
tions and sufficient flexibility to 
explain all of the results, it is neces- 
sary to examine the implications of 
the ratio theory for the general 
problem of stimulus generalization. 
Hypothesis 4a suggests that the 
phenomenon of stimulus generaliza- 
tion should be interpreted in terms of 
stimulus/AL ratios rather than by 
reference to individual stimulus ener- 
gies. However, Hypothesis 4b pre- 
cludes the consideration of relative 
response strengths as solely a function 
of test stimulus ratio-positive training 
ratio similarity. Lashley (1938) 
stated that cartwheels and pinheads 
are not on the same psychological 
dimension but failed to suggest the 
point at which these dimensions 
diverge. The ratio theory attempts 
to explain the absolute range of 
equivalence by claiming that equiva- 
lence has two aspects: (a) equiva- 
lence of sets; and (b) equivalence of 
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stimuli within sets. If all of the test 
stimulus ratios are larger or smaller 
than the positive training ratio, the 
subject perceives all of the stimuli as 
larger or smaller than that previously 
approached and behaves in the situa- 
tion as if it were novel so that previ- 
ously learned factors do not apply. 
The sets are nonequivalent in this 
case despite the fact that a single 
test stimulus ratio might be quite 
similar to the positive training ratio. 
However, individual stimulus simi- 
larity principles will not operate when 
the total situation appears to be un- 
related to past experience in that the 
sizes of the stimuli of the training 
and test sets are not on the same 
psychological continuum although 
they vary on the same physical dimen- 
sion. This loose interpretation is 
strengthened in the ratio theory with 
the a priori statement of the precise 
point at which seemingly similar sizes 
diverge into unrelated psychological 
experiences. 

The ratio theory maintains that the 
phenomenon of stimulus generaliza- 
tion is explained in terms of equiva- 
lence while equivalence is analyzed in 
terms of stimulus/AL ratios. Gen- 
eralization gradients are derivable 
from Hypothesis 4a on the basis of the 
relative proximities of the three test 
stimulus ratios to the positive training 
ratio. However, empirical support of 
Hypothesis 4b has suggested that it is 
misleading to think of the problem in 
this traditional framework since it 
means that there is no plausible 
manner of stating fixed gradients 
even in terms of relative ratio differ- 
ences. Actually, there is no intrinsic 
reason that nonequivalence should 
occur exactly at the point stated by 
Hypothesis 4b. Despite the fact that 
such a place of departure from the 
basic choice hypothesis (4a) is reason- 
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able, Hypothesis 4b is an ad hoc 
statement although it was established 
in advance of data collection. The 
underlying postulate, that in order to 
be perceived as similar stimuli must 
lie on the same psychological con- 
tinuum, only requires that there be 
such a point, not that it be at the 
specific place stated. The novelty of 
the ratio theory is that two separate 
functions are postulated to explain 
decreasing stimulus generalization and 
complete nongeneralization so that 
the latter phenomenon is not theo- 
retically conceived as the lower limit 
of the former. James’ (1953) analysis 
of two element problems which also 
employs the concept of an abrupt 
transition from single stimulus prefer- 
ence to nonequivalence as indexed by 
equal preference for both stimuli is 
compatible with this point of view. 
Whether nonchoice situations such as 
single stimulus training and test 
follow the same principles remains to 
be evaluated. 

The research of Kuenne (1946), 
Alberts and Ehrenfreund (1951), and 
Reese (1962) on transposition as well 
as that of the Kendler’s and their 
associates (summarized by Kendler, 
1960) on reversal and nonreversal 
shifts has revealed that there is a 
change from simpler processes to 
concepts as the basis of learning at 
some critical age level. This work has 
suggested that age rather than the 
presence of verbal ability is the 
determining factor and that the 
absolute age varies with the specific 
problem. Zeiler’s (1962) demonstra- 
tion of the irrelevance of verbalization 
or concepts of intermediate size at the 
level of 4- and 5-year-old children 
fits well with the conclusions of these 
authors. A compelling problem is the 
discovery of the processes by which 
the basis of learning undergoes this 
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change; however, the ratio theory is 
restricted (as is the Spence formula- 
tion) to organisms prior to the stage 
at which concepts become the domi- 
nant approach to learning. 

There is a large and varied quantity 
of data that indicate the incomplete- 
ness of adaptation level theory as a 
general model for discrimination learn- 
ing. These classes of data can, in 
general, be subsumed under the head- 
ings of attention, opportunity for 
comparison of the stimuli, and con- 
figurational influences in perception 
(Bitterman, 1953), and include what 
Spence (1952) has called transverse 
patterning. Such influences seem 
refractory to adequate treatment by 
all theories at the present time and 
exist as phenomena still in need of 
rigorous explanation. The ratio 
theory makes no pretense of even sug- 
gesting a solution to these problems. 

With respect to absolute and rela- 
tional theories of the stimulus, the 
ratio theory can be interpreted as an 
integration of the two views. It is 
relational in that the value of the 
stimulus is not purely its physical 
dimensions but rather these dimen- 
sions in relation to the entire stimulus 
set and the other factors entering into 
the adaptation level. It is absolute 
in that absolute ratios are learned and 
responded to in test. It is similar in 
principle to Wallach’s (1948) concept 
of brightness ratio as the determinant 
of achromatic color, but differs from 
Wallach with respect to the definition 
of the factors entering into the 
denominator of the ratio. 

The limitation inherent in other 
theories that are not based on a shift 
of level principle is that the essential 
quality of the test stimuli does not 
vary as a function of their presenta- 
tion in the context of current and im- 
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mediate past stimulation. Since this 
quality depends on the state of the 
organism in establishing the frame of 
reference which is altered by the level 
of stimulation, the ratio theory asserts 
that as a result of the organism change 
between the conclusion of training 
and the time of the response on the 
first test trial the basic nature of the 
stimulus as the determinant of the 
response is also revised. For absolute 
theory, the organism as the receiver 
of stimulation is not only invariant 
but irrelevant. The test components 
derive their excitatory strength auto- 
matically from the generalization 
gradients established as a function of 
prior reinforcement: the state of the 
organism with respect to the response 
in test was fixed at the conclusion of 
training. Relational theory asserts 
that the actual stimuli are irrelevant 
since the learning consists of a rela- 
tionship that transcends particular 
components. The ratio theory alone 
emphasizes the unique contribution 
of the test stimuli to the perception of 
those components: a certain portion 
of the adaptation level which is the 
basis of stimulus definition is deter- 
mined by the specific objects present 
on the first test trial. The subject has 
learned to respond to a certain ratio. 
However, the test stimulus ratio most 
similar to that learned as positive is a 
product of the interaction of previous 
and current stimulation as codeter- 
minants of the adaptation level. 
Since the nature of response in test is 
not considered to be solely the out- 
come of training effects on an unvary- 
ing organism, the ratio theory has 
been able to deduce and predict the 
various phenomena now known to be 
integral aspects of the intermediate 
size problem. 
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A theory is proposed to explain how a human S produces a serial pattern 
from a remembered ‘‘concept” or rule, and how he acquires the concept 
by induction from an example of a patterned sequence. The theory 
consists of a formal language and a computer program, one part of which 
simulates the process of sequence production, the other, the process of 
rule acquisition. The acquired rule is represented by a symbolic struc- 
ture stored in memory. Several versions of the program show varying 
levels of inductive power. The theory predicts rather successfully 
which problems, from a set of letter series completion test items, will be 


the more difficult for human Ss. 


In most research on the acquisition 
of concepts, a concept is taken to mean 
a subclass of some class of objects, or, 
alternatively, a procedure for identi- 
fying a particular object as belonging 
to, or not belonging to such a subclass. 
The usual behavioral evidence that a 
subject has attained a concept is that 
he is able to sort objects that embody 
the concept from objects that do not. 
For example, we would say that a sub- 
ject had attained the concept “red” 
if, on instructions to sort a pile of vari- 
ously colored objects, he placed all the 
red objects in one pile and all the 
others in another. 

There is no necessary relation be- 
tween the ability to identify objects 
exemplifying a concept and the ability 
to produce examples of the concept. 
A familiar example of the lack of re- 
lation between these two abilities is 
the discrepancy between an indivi- 
dual’s reading vocabulary and his 
speaking vocabulary (his ability to 
understand and his ability to produce 


1 We are grateful to the Carnegie Corpora- 
tion for a research grant that assisted us in 
this work, and to several of our colleagues, 
including L. W. Gregg and K. R. Laughery, 
who turned our attention to the problems of 
serial pattern acquisition. 


words in a language). In experiments 
on memorization, the same discrep- 
ancy is familiar as the difference be- 
tween ability to recognize and ability 
to recall. 

There are some kinds of concepts, 
however, where we commonly meas- 
ure attainment by ability to produce 
an object satisfying the concept rather 
than mere ability to identify an object 
as belonging to the concept. Prom- 
inent among these are concepts in 
the form of serial patterns. For ex- 
ample, the sequence abababa embodies 
the concept of “simple alternation of 
the characters a and b.” We might 
test the subject’s attainment of the 
concept by presenting him with a se- 
quence of characters and asking him 
to decide whether it embodies the con- 
cept or not. More often, however 
(e.g., in the Thurstone Letter Series 
Completion Test), we ask him to dem- 
onstrate his attainment of the con- 
cept by presenting him with a se- 
quence that embodies it, and requiring 
him to extrapolate the sequence. 
Thus, we would say that he had at- 
tained the concept, or recognized the 
pattern, embodied in the sequence 
given above if he were able to write 
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down ba as the next two characters in 
the sequence.? 

In this paper we propose to explain 
in what form a human subject remem- 
bers or ‘‘stores’’ a serial pattern; how 
he produces the serial pattern from 
the remembered concept or rule; and 
how he acquires the concept or rule by 
induction from an example. The 
theory takes the form of a computer 
program that simulates the processes 
of sequence production and rule 
acquisition, and that creates in the 
computer memory symbolic structures 
to represent the stored concept.* 

Three kinds of evidence will be 
offered in support of the theory. First, 
an “existence” proof is provided—it 
is shown that the kinds of symbolic 
representations and information proc- 
esses postulated in the theory are 
sufficient to permit a mechanism en- 
dowed with them to induct, produce, 
and extrapolate patterns. Second, 
the theory is shown to be parsimoni- 
ous in a certain sense—the processes 
and forms of representation postu- 
lated in it are basically the same as 
those that have previously been used 
to explain certain forms of problem 
solving, and rote learning behavior. A 
mechanism possessing the basic capa- 
bilities for performing these other 
tasks has also the capabilities for per- 
forming tasks of the kind we are con- 
sidering here. Third, the predictions 


2 Notice that neither in concept identifica- 
tion nor concept production is it essential 
that the concept be referred to by a name, or 
even have a name. We shall not consider in 
this paper the relation between concept nam- 
ing, on the one hand, and concept identifica- 
tion or production, on the other. 1 

On the methodological issues involved in 
the use of computer programs to express and 
test psychological theories see Newell and 
Simon (1962). Laughery and Gregg (1962) 
and Feldman, Tonge, and Kanter (1961) have 
incorporated similar mechanisms for detect- 
ing and generating serial patterns in tasks 
somewhat different from the one considered 
in this paper. 
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of the theory show good qualitative 
agreement with the gross behavior of 
human subjects in the same tasks, in 
particular, predicting the relative 
difficulty of different tasks. 

The theory casts considerable light 
on the psychological processes in- 
volved in series completion tasks. It 
indicates that task difficulty is closely 
related to immediate memory require- 
ments. It suggests what kinds of 
errors may be expected from human 
subjects in series completion tasks, 
It provides a clear-cut and opera- 
tional referrent for the notion of 
“meaningful” as distinct from rote 
organization of material in memory. 


CHARACTERIZATION OF SEQUENTIAL 
PATTERNS 


In Table 1 are shown 25 Thurstone 
letter series completion problems. 
The first 10 problems, designated by 
the letters A through J, were used as 
training problems with our human 
subjects, the last 15 designated by the 
numbers 1 through 15, as test prob- 
lems. The test problems vary 
widely in difficulty, the number of 
subjects in a group fo 67 who solved 
each problem ranging from 27 to 65. 

In explaining human behavior in 
this problem solving task we seek, 
first, to form a plausible hypothesis 
about “what is learned”: about the 
way in which a subject stores such 
patterns in memory in order to re- 
member them, reproduce them, and 
extrapolate them. The first part of 
our theory, based on a simple “‘lan- 
guage” for characterizing serial pat- 
terns, postulates that such patterns 
are represented in memory by sym- 
bolic structures built from the vocabu- 
lary of such a language. 

It is obvious that if a subject is able 
to extrapolate a sequence, he holds in 
memory something different from the 
bare sequence with which he was pre- 
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TABLE 1 
LETTER SERIES COMPLETION PROBLEMS 


Training problems 


Your task is to write the correct letter in 
the blank. 


Read the rowyof letters below. 


A. abababab— 

The next letter in this series would be a. 

Write the letter a in the blank. 

Now read the next row of letters and 
decide what the next letter should be. 
Write that letter in the blank. 

B. cadaeafa— 

You should have written the letter g. 

Now read the series of letters below and 
fill in each blank with a letter. 

C. aabbecdd__ 
D. abxcdxefxghx— 
E. axbyaxbyaxb— 

You will now be told what your answers 
should have been. 

Now work the following problems for 
practice. Write the correct letter in 
each blank. 

F. rsrtrurvr— 

G. abedabceabcefabe__ 
H. mnlnknjn— 

I. mnomoompom— 
J. cegedeheeeiefe— 


You will now be told the correct answers. 


Test problems 


1. cdeded__ 

2. aaabbbecedd_— 

3. atbataatbat_ 

4. abmcdmefmghm— 
5. defgefghfghi_ 

6. qxapxbqxa— 

7. aduacuaeuabuafua— 
8. mabmbemcdm— 
9. urtustuttu— 

10. abyabxabwab— 
11. rscdstdetuef__ 

12. npaoqapraqsa— 
13. wxaxybyzczadab__ 
14. jkqrkIrslmst— 

15. pononmnmimlk— 
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sented. The sequence, taken by it- 
self, provides no basis for its own 
extrapolation. Indeed, from a strict 
mathematical standpoint, there is no 
uniquely defined correct answer to 
a serial pattern extrapolation task. 
Consider, for example, the sequence 
1,2,3,4,.... What is its continu- 
ation? One answer might be 5 but 
another, equally valid, would be 1 
(reno nS) ay 293; °4, 1 2.8 
Still another would be 2 (i.e., 1, 2, 3, 
AM ONANG§ 85,10) 9)112,\. .. .)s 

What is common to all these alterna- 
tive solutions is that each is produced 
by arule that is capable of continuing 
the sequence indefinitely. It is prag- 
matically true, although not logically 
necessary, that for the items com- 
monly used on serial pattern tests it 
is easy to get consensus about the cor- 
rect continuation. Presumably, the 
reason for this is that one sequence is 
sufficiently “simpler” or ‘‘more ob- 
vious” than others, that almost all 
persons who find an answer find that 
one first. But it must be emphasized 
that this is a psychological, not a 
logical, matter. 

It is also not logically necessary that 
a given continuation be associated 
uniquely with a particular rule. There 
may be several different ways of ob- 
taining the same continuation, or of 
representing a particular rule. In- 
deed we shall encounter some ex- 
amples of such multiple possibilities 
as we go along. 

We must begin, then, by saying 
something about what the subjects 
bring to the task—for what they bring 
will certainly affect their criteria © 
simplicity, the kinds of patterns they 
will discover, and how difficult it will 
be for them to discover them. We 
assume the subjects have in memory 
the English alphabet, and the alpha- 
bet backwards. (There are some 
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alternatives to the latter assumption, 
but we make it, at present, for sim- 
plicity.) We assume the subjects 
have the concept of “same” or “equal” 
—e.g., cis the same asc. We assume 
they have the concept of “next” on a 
list—e.g., d is next to c on the alpha- 
bet, and f to g on the backward alpha- 
bet. We assume they are able to pro- 
duce a cyclical pattern—e.g., to cycle 
on the list a, b in order to produce 
abababababa. ... Finally, we as- 
sume that they are able to keep track 
of a small number of symbols in im- 
mediate memory—for present pur- 
poses, we need to assume only the 
capacity to keep track of two symbols 
simultaneously. We may call these 
the first and second symbols in im- 
mediate memory, respectively. 

Now, using a simple language cap- 
able of handling only the concepts 
that have just been described, repre- 
sentations can be constructed for all 
of the serial patterns in Table 1, and 
many others. It will be easiest to 
show how this is done by considering 
four examples of gradually increasing 
complexity. 

Pattern 3: atbataatbat_.. This 
sequence can be described most simply 
if we mark it off in periods of three- 
letter lengths: atb ata atb at_. 
Having done this, we observe that the 
first position in each period is occupied 
by an a, the second position, by a t. 
We refer to these patterns as simple 
cycles of a’s and t’s, respectively. The 
third position in the period is occupied 
by the cycle ba ba. . . . We refer to 
a pattern of this kind as a “cycle on 
the list b,a.” Hence, we can describe 
the entire Pattern 3 by the notation: 


3. [a,t, (ba) J 


Pattern 2: aaabbbcccdd_. Again, 
this sequence can be marked off in 
periods of three letters; but in this 
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case, there are simple relations among 
the letters within each period (they 
are, in fact, identical). One way in 
which we can describe Pattern 2 is by 
the notation: 


(M1 =Alph; a] 
2. [M1, M1, M1, N(M1)] 


The notation is interpreted as follows: 
we set a variable, M1, equal to the 
first letter, a, of the alphabet. Each 
period is executed by producing M1 
three times, and then replacing M1 by 
the next (N) letter of the alphabet. 
An alternative representation of this 
pattern is shown as Example 2b in 
Table 2. We shall refer to it later. 
Pattern 13: wxaxybyzczadab_. This 
sequence is more complicated than the 
previous two, but can again be 
analyzed in terms of a period of three 
symbols; with internal relationsamong 
the first two symbols of each period. 
One description of this pattern is: 


[M1=Alph; w: M2=Alph; a] 
13. [M1, N(M1), M1, M2, N(M2)] 


Here are two variables, M1 and M2, 
corresponding to alphabetic sequences; 
but the M1 sequence begins with w, 
while the M2 sequence begins with a. 
Notice that when we come to the end 
of the list for such a sequence, we be- 
gin again at the beginning—z is fol- 
lowed by a. Thus, these alphabetic 
sequences are identical with what we 
have called cycles on a list; in this 
case, the list is the alphabet, (a . . .z). 

Pattern 15: pononmnmlmlk_. Our 
final example, also based on a period of 
three, can be represented in the same 
notation as the others, with the addi- 
tion of one new operator: 


[M2=M1=Balph; p] 


15. [M2, N(M2), M2, N(M2), 
N(M1), E(M2,M1)] 
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TABLE 2 
PATTERN DESCRIPTIONS OF THE TEST PROBLEMS 


Example Initialization Sequence iteration 
la. M1 = (c, d,); c ea N(M1) 
ib. = 
2a. Mi = Alph; a + MI, MI, wae 
2b. M1 =Alph;a MIG. Nai) 
3. Mi = (b, a); b, a, t, M1, N(M1) 
4. M1 = Alph; Mr i VAR M1, N(M1), m 
5. Mi ZM? = "Alph; d N(M1) ae N(M}), Mi, N(M1), M1, N(M2), 
6a. M1 =(q, p); a: M2 = (a, b)ja ML Noe, x, M2, N(M2) 
6b. po q. X, a, P, 
7. M1 = Alph; d: M2 = Balph;c a, MI, Naw, u, a, M2, N(M2), u 
8. Mi = Alph; a m, Mi, N(M1), MI 
9, M1 = Alph;r u, M1, N( uy t 
10, Mi = Pe y a, b, M1, N 
il. MI = Alph; r: M2 = Alph; c 1, N(M1), MI, M2, N(M2), M2 
12, M1 = Alph; n: M2 = Alph; p Mi, RA 2; (M2 J)a 
13, Mi = Alph; w: M2 = Alph;a M1, N(M1), M1, M2, Nisa) 
14. M1 = Alph; j: M2 = Alph; q Mi, N(M1), M1, M2, N(M 
15. Mi = M2 = Balph; p M1, N(M1), M1, N(M1), Mi Nim), E(M1, M2) 


Note.—Alph = alphabet; Balph = alphabet backwards. 


There are two variables, M1 and M2, 
which follow the sequence of the 
alphabet backwards (Balph), start- 
ing with the letter p. The variable 
M2 is produced, then the next letter 
of the sequence, then the next; then 
the next in sequence to M1 is found, 
and M2 is set equal (E) to the new 
M1. 

Table 2 gives pattern descriptions 
for the entire set of 15 test problems in 
the notation we have just introduced. 
We remark again that the descriptions 
are not necessarily unique—in many, 
if not all, cases, it is fairly easy to find 
alternative descriptions of the pat- 
terns. Those provided appear in- 
tuitively to be the simplest among the 
alternatives we have found. In the 
case of Patterns 1, 2, and 6, we give 
two alternatives. 

The pattern descriptions contain all 
the information contained in the se- 
quences from which they were de- 
rived. They can be used to recon- 
struct the sequences. More than 
that, they can be used to extrapolate 
the sequences indefinitely—hence they 
can be used to perform the task with 
which the subjects were confronted in 
the Letter Series Completion Test. 
Thus, we may assert that anyone who 


has learned the pattern description 
has learned the concept embodied in 
the corresponding sequence. Our 
central hypothesis about human con- 
cept attainment in situations involv- 
ing serial patterns is the converse of 
this assertation, namely: subjects at- 
tain a serial pattern concept by generat- 
ing and fixating a pattern description 
of that concept. 


GENERATING SEQUENCES 


We have now achieved our first ob- 
jective: to formulate a simple, parsi- 
monious language of pattern descrip- 
tion, based on plausible hypotheses 
about what subjects bring to the serial 
pattern task. Our next tasks are (a) 
to propose a mechanism that would 
enable a subject, holding such a pat- 
tern description in memory, to pro- 
duce and extrapolate a sequence; and 
(b) to propose a mechanism that 
would enable a subject to induct such 
pattern descriptions from segments of 
letter sequences. We consider first 
the possible structure of a sequence 
generator, 

Information processing theories al- 
ready exist that seek to explain how 
humans perform certain other tasks, 


TOR O 
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including problem solving—the Gen- 
eral Problem Solver (GPS)—and rote 
memory—the Elementary Perceiver 
and Memorizer (EPAM). In con- 
structing our present theory, we wish 
to avoid creating elaborate mecha- 
nisms ad hoc, and seek, instead, to 
build the hypothesized system from 
the same elementary mechanisms that 
have been used in GPS and EPAM. 
Our language of pattern description 
makes this easy to do, since the proc- 
esses required to produce sequences 
fitting the list descriptions can be 
formulated naturally and simply in 
the list processing language that has 
been used in constructing these earlier 
theories. 

A list processing language, as its 
name implies, is a system of processes 
for acting upon symbolic information 
represented in the form of lists and 
list structures (lists of lists). Among 
the fundamental processes in such a 
language are the process of writing or 
producing a symbol, the process of 
copying a symbol (i.e., writing a sym- 
bol that is the same as the given 
symbol), and the process of finding the 
symbol that is next to a given sym- 
bol on a list. In addition, there are 
processes for inserting symbols in 
lists, deleting symbols from lists, and 
otherwise modifying lists and list 
structures.‘ 


4 We cannot enter here into a full discussion 
of the reasons for supposing that human think- 
ing processes are fundamentally list processes. 
For a general nontechnical introduction to 
this point of view see Miller, Galanter, and 
Pribram (1960). The particular list process- 
ing language that has been used to define 
GPS, EPAM, and the theory set forth in this 
paper is- IPL-V (Information Processing 
Language V). The language is described in 
Newell (1961). Investigators who wish more 
detail about the program of the Sequence and 
Pattern Generators can obtain a program 
listing from the authors. The program can be 
run on most of the large computers that are 
available on university campuses, for example 
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We can see rather immediately that 
processes of these kinds will enable the 
subject, having stored the pattern de- 
scription, to produce and extrapolate 
the sequence. By way of example, let 
us consider in detail Pattern 9 in 
Table 2. To produce the sequence de- 
scribed by the pattern, we simply in- 
terpret the pattern description, sym- 
bol by symbol, as follows: 


1. Hold the letter “r” on the list named 
“Alphabet” in immediate memory. 

2. Produce the letter "u." 

3. Produce the letter that is in immediate 
memory (initially, this will be “r'’), 

4. Put the next letter on the list in im- 
mediate memory (on the first round, this 
will move the pointer to ‘“‘s"’). 

. Produce the letter “4.” 

. Return to Step 2, and repeat the se- 
quence as often as desired. 


an 


Any mechanism that follows the 
program outlined in Steps 1 through 
6 will produce the sequence: urtust- 
uttuut.... Thus, all that is re- 
quired to construct such a mechanism, 
is to give it the capacity to interpret 
the symbols in the pattern description, 
and to execute the actions they signify 
—actions like “hold in immediate 
memory,” “‘produce,”’ “find next on 
a list,” “repeat.” 

The second part of our theory, then, 
is a program, written in IPL-V—that 
is capable of generating sequences 
from pattern descriptions by execut- 
ing the elementary list processes 
called for by the descriptions. As we 
have seen, the program is extremely 
simple. We postulate: normal adult 
beings have stored in memory a pro- 
gram capable of interpreting and exe- 
cuting descriptions of serial patterns. 
In its essential structure, the program 
is like the one we have just described. 

Our main evidence for these asser- 
tions is that the program we have 
written, containing the mechanisms 


the Bendix G-20; IBM 704, 709, and 7090; 
the CDC 1604; or the Univac-Scientific 1507. 
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and processes we have described, is in 
fact capable of generating and extra- 
polating letter series from stored de- 
scriptions. Weare not aware that any 
alternative mechanism has been hy- 
pothesized capable of doing this. 
Further, the basic processes incor- 
porated in the program are processes 
that have already been shown to be 
efficacious in simulating human prob- 
lem solving and memorizing behavior. 


PATTERN GENERATOR 


We come, finally, to the question of 
how subjects induct a pattern descrip- 
tion from the pattern segment that is 
presented to them. Our answer to 
this question again takes the form of a 
program that is capable of doing just 
this, in cases where the pattern is not 
too complex. We shall describe the 
program, and then consider the rea- 
sons for supposing it bears a close 
used by human subjects. F 

The inputs to the pattern generato 

are the letter sequences presented to 
thesubject. The outputs of the genera- 
tor are the corresponding pattern de- 
scriptions. By considering what is 
involved in translating a sequence 
(Table 1) into its pattern description 
(Table 2), we can achieve some under- 
standing of what is involved in the 
generator. Basically a description 
characterizes a sequence in terms of 
some initial conditions—for example, 
the symbol to be stored at the outset 
in immediate memory—and some re- 
lations among symbols—for example, 
that one symbol follows another in 
the alphabet. The main task, then, 
of the pattern generator is to detect 
these initial conditions and relations 
in the given sequence, and to arrange 
them in the corresponding pattern. 

There is gross behavioral evidence 
that the subjects accomplish these 
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tasks by first discovering a periodicity 
in the sequence. Sequence 1, for ex- 
ample, has a period of two, for every 
other symbol is a c. Similarly, 
Sequence 2 has a period of three, for 
it consists of segments of three equal 
symbols each. The pattern generator 
seeks periodicity in the sequence by 
looking for a relation that repeats at 
regular intervals. Thus, it discovers 
that the same symbol occurs in every 
second position in Sequence 1, and 
that the next symbol occurs at every 
fourth position starting with the first 
in Sequence 5. If this kind of peri- 
odicity is not found, the pattern gener- 
ator looks for a relation that is inter- 
rupted at regular intervals—Sequence 
2 provides an example, where the re- 
lation of “same” is interrupted at 
every third position. Thus, to dis- 
cover periodicity in the sequence, the 
pattern generator needs merely the ca- 


family resemblance to the programs # pacity to detect relations like “same” 


and “next” with familiar alphabets. 

Once a basic periodicity has been 
discovered, the details of the pattern 
are supplied in almost the same way— 
by detecting and recording the rela- 
tions—of equal and next—that hold 
between successive symbols within a 
period or between symbols in cor- 
responding positions of successive 
periods. The pattern of Sequence 9, 
for example, records that a period 0 
three was discovered; that the first 
position in the period is always 0C- 
cupied by the same symbol—u, an 
the third position always by t. In the, 
second position, however, each suc- 
cessive period has the symbol next in 
the alphabet to the second symbol in 
the previous period. 

A number of different variants of 
the pattern generator have been writ- 
ten, all of them, however, based 0n 
these same simple relation recognizing 
processes. The several variants show 
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different degrees of success in de- 
scribing the 15 test sequences. A 
particular pattern generator may fail 
to describe a given pattern for either 
one of several reasons. It may be un- 
familiar with an alphabet used in con- 
structing the pattern. It may not 
have a sufficiently wide repertoire of 
relations it can test. It may have in- 
adequate means for organizing and re- 
cording as a coherent pattern descrip- 
tion the relations it discovers. All of 
these reasons for failure can be iden- 
tified in our experiments. 

We would expect that among our 
human subjects, also, different levels 
of performance on the Letter Series 
Tests might.be associated with the 
same kinds of limitations. We shall 


raise this point again when we look we pattern generator—we shall 
at some of the data on human per- notat t to distinguish among th 


formance and its comparison with t 
computer simulation. 


TAB 


PROBLEMS FAILED BY GROUP OF 12 
OF THE COMPUTER PR 


Subjects* 
Problem 
number 
1 2 3 4 5 6 7 
1 
2 
3 X X 
4 
5 xX |X| X |x X 
6 x x 
7 xX | X 
8 
9 X X 
10 
11 xX | X | xX 
12 Dd] ji? Gal Ie. € 
13 xX |/ XxX) xX] xX 
14 x 
15 X xX |X 
Total 
correct| 15 | 14 | 12 | 160 | 10| 9 | 8 


Some information on the perform- 
ance of four variants of the program— 
A, B, C, and D—is provided in Table 
3. The program became progressively 
more powerful, Variant A solving 3 of 
the 15 problems; Variant B, 6; Vari- 
ant C, 7; and Variant D, 13. Except 
for Problem 3, all problems solved by 
a less powerful variant were solved by 
the more powerful variants. There 
is no logical necessity for this ordering 
relation to hold, but as an empirical 
matter it would be rather difficult to 
construct a variant that would suc- 
ceed on the “hard” problems and fail 
on the “easy” ones. Thus, the pro- 
grams reveal a “natural” metric of 
difficulty—a point we shall discuss 
further in the next section. 


several vai titutes the t 
part of our theory of human 


ax BY VARIANTS 


x 
xX|X/|X 
x 
XIXW IX XX |X 
x |X 
X/|X/X/X)] XIX) xX|x]x 
X/|X |X] xX 
Xa K ENRERE TE 
X |X X|X|xX 
Gay oP Slat. eS E BG 
X | X X |X|] X)|] xX] xX 
X X |X|] xX 
Dal WP. GN DG a>, Gi >, had Pe, | R dG 
xX |X X|X)X| xX] xX 
BET NOL TOs E A S NO! tet ALS 


Note.—X = problem missed. 


3 a In order of performance. 


542 


pattern learning. We postulate : nor- 
mal adult human beings have stored 
in memory a program, essentially like 
the pattern generator just described, 
capable of detecting relations and re- 
cording a pattern description for a 
simple sequence. 


EXAMINATION OF SOME 
EMPIRICAL DATA 


Thus far we have been concerned 
primarily with describing a set of pro- 
grams capable of doing what human 
subjects demonstrably can to: dis- 
cover, remember, and produce simple 
serial patterns. We have been able 
to find some quite simple mechanisms, 
incorporating elementary symbol ma- 
nipulating and list manipulating proc- 
esses, that have this capacity. The 
next stage in inquiry is to see what 
light these mechanisms—hypothesized 
as an explanation of human perform- 
ance in these tasks—can cast on the 
behavior of subjects in the laboratory ; 
and conversely, to seek more positive 
tests of the validity of the explanation. 

The data we shall discuss here were 
obtained by giving the Letter Series 

Completion Test to two sets of sub- 
jects. Since our main interest was in 
analyzing differences in difficulty 
among problemsratherthan differences 
in ability among subjects, no special 
care was taken to obtain samples 
representative of any particular popu- 
lation. The first group, 12 subjects, 
ranged from college graduates to 
housewives. The second group, 67 sub- 
jects, comprised an entire class of high 
school seniors. Problems 1-15 of 
Table 1 were administered to the 12 
subjects individually and to the 67 
subjects as a group. 

The three columns of Table 4 show 
for each problem: (a) the number of 
subjects in the first group who solved 
it, (b) the number of subjects in the 
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TABLE 4 


PROBLEM DIFFICULTY: COMPARISON OF 
HUMAN SUBJECTS WITH VARIANT C 
OF PROGRAM 


Number of subjects 
Problem obtaining correct solution K 
Raber Program 
Group of 12 | Group of 67 
1 12 65 S 
2 12 61 S 
3 10 60 U 
4 12 57 S 
5 2 45 U 
6 8 48 S 
7 5 27 U 
8 9 49 S 
9 5 43 S 
10 9 51 S 
il 4 39 U 
12 5 42 U 
13 7 43 U 
14 6 48 U 
15 5 34 U 


Note.—Problems shown in boldface were below 
median in difficulty for subjects in question. 
aS =solved, U = failed. 


second group who solved it, and (c) 
whether it was solved (S), or left un- 
solved (U) by Variant C of the com- 
puter program.’ The problems of less 
than median difficulty, as defined by 
the numbers of subjects solving them, 
are shown in boldface type in Columns 
1 and 2. 

The three columns of Table 5 show: 
(a) the average time per problem for 
those subjects in the group of 12 who 
obtained the problem solution, (b) the 
average time spent by all subjects 1n 
the first group on each problem, and 
(c) the time spent per problem by 
Variant D of the program (which 
solved 13 of the 15 problems). The 
times that were below median 1m 
Columns 1 and 2 are shown in bold- 
face type. 


5 Variant C was used for this comparison 
because it solved about half (7 out of 15) of 
the problems, thus permitting them to be 
divided evenly into “easy” and “hard.” 
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TABLE 5 
COMPARISON OF 12 SUBJECTS WITH 
VARIANT D OF PROGRAM: 

TIME PER PROBLEM 


Seconds per subject Seconds 
Pronto — 
number 
wiubiects.| | All subjects | Program D 
1 6.0 6.0 9 
2 3.8 3.8 28 
3 24.7 23.0 18* 
4 16.8 16.8 23 
5 27.5 40.6 35 
6 37.0 31.4 19 
7 37.8 49.2 19» 
8 24.9 24.9 23 
9 18.8 28.7 17 
10 20.9 20.7 18 
11 21.5 37.3 35 
12 49.8 49.0 24 
13 61.7 65.5 29 
14 41.2 47.8 36 
15 48.0 56.8 30 


Note.—Times below median shown in boldface. 
® Program D failed to solve problem. 


Considering both tables together, 
we have four measures of problem 
difficulty for the human subjects—two 
measures of numbers of subjects who 
solved the problems, and two measures 
of problem solving time. Not sur- 
prisingly, there is a high level of agree- 
ment among the four measures as to 
which problems were easy, and which 
hard. On all four measures, Problems 
1, 2, 3, 4, 8, and 10 ranked below the 
medians in difficulty, as did Problem 
6 on two measures (number solving) 
and Problem 9 on two measures 
(problem solving time). Problems 14 
and 11 were each below median on one 
measure, while Problems 5, 7, 12, 13, and 
15 were above the median in difficulty 
on all four measures. For purposes 
of gross comparison, we will call the 
eight problems in the first two groups 
the easy ones, and the seven problems 
in the last two groups the hard 
ones, 
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To see whether there is anything in 
our theory that would account for 
these differences in difficulty, we ex- 
amine the pattern descriptions in 
Table 2. By common sense inspection 
of the pattern descriptions, it is clear 
that the easier problems have simpler 
descriptions—we could have made an 
almost perfect prediction of which 
problems would be above median in 
difficulty simply by counting the num- 
ber of symbols in their pattern de- 
scriptions. (There is some ambiguity 
for Problem 6, which is neither as 
difficult as Description 6a would sug- 
gest, nor as simple as Description 6b 
would imply.) à 

But the lengths of the descriptions 
do not tell the whole story. If we now 
examine the patterns more closely we 
see that all of the patterns for the 
hard problems, and none of the pat- 
terns for the easy ones (except 6a) 
call for two positions in immediate 
memory. To extrapolate these more 
difficult sequences, the subject has to 
keep his place in two separate lists, 
but only in one at most, for the easier 
sequences. Moreover, to build up the 
patterns for the former sequences, the 
subject had to detect and keep track of 
relations on two distinct lists, as 
against one for the latter sequences. 

An alternative hypothesis would be 
that the length of period was the 
source of difficulty. It is true that all 
the patterns with a period of four or 
more symbols (Patterns 5, 7, 11, and 
14) are among the hard ones; but 
Patterns 12, 13, and 15, which have 
periods of three, are hard; while 
Patterns 3, 4, 8, and 10, which also 
have periods of three, are easy. Al- 
though the evidence is far from con- 
clusive, number of positions in im- 
mediate memory appears to be more 
closely related to difficulty than length 


of period, 
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We cannot undertake here a detailed 
analysis of the errors made by our sub- 
jects, but we can make one observa- 
tion that helps explain why Problem 
9 appeared rather more difficult (in 
terms of failure to solve, not solution 
time) than its pattern description 
would have predicted. The main proc- 
ess, we have hypothesized, for solv- 
ing these problems is to detect rela- 
ions between adjoining symbols, or 
symbols in corresponding positions of 
successive periods. But towards the 
end of the sequence in Problem 9—the 
symbols tuttu—there are a number of 
spurious relations of “equals” and 
“next” that are not part of the pat- 
tern. Discovery of these relations, 
and failure to check them through the 
earlier part of the sequence, would 
lead to wrong answers. For example, 
the partial sequence given above could 
reasonably be extrapolated by an- 
nexing the symbol t. 


COMPARISONS WITH PROGRAM 
PERFORMANCE 


We have seen that one part of our 
theory—the pattern descriptions— 
allows us to make predictions about 
the relative difficulty of serial pattern 
problems for human subjects. It 
may be objected that the test is sub- 
jective, since we cannot know that the 
patterns used by the subjects are the 
same as those we have written down. 
The objection would be more con- 
vincing if it could be shown that the 
patterns could be described in a man- 
ner quite different from the one we 
have proposed. But there is addi- 
tional evidence we can bring to bear 
on the question, derived from the 
programs used to generate the patterns 
—the third part of our theory. 

Of the several variants of the pat- 
tern generating program we have 
studied, Variant C will be considered 


here, because it solved 7 of the 15 
problems, hence found about half of 
them “easy” and half “hard.” From 
Table 4 it can be seen that the pro- 
gram solved none of the problems we 
have previously labeled hard, and all 
but one (Problem 3) of the problems 
previously labeled easy. Hence the 
pattern generator also provides ex- 
cellent predictions of the relative 
difficulty of the problems for human 
subjects. 

A closer investigation of the pro- 
gram’s failure with the hard problems 
showed that the difficulties arose 
specifically in keeping track of the 
lists associated with distinct positions 
in immediate memory. The program 
was incapable of organizing the parts 
of the pattern into an overall structure 
when two immediate memory posi- 
tions were involved. We take this as 
additional evidence for the plausibility 
of our hypothesis that this was the 
ocus, also, of the difficulties the lessl 
successful human subjects encoun- 
tered. A more powerful version of the 
program, Variant D, overcame most 
of these difficulties, and failed only on 
Problems 3 and 7. A still more 
powerful version has solved all but 
Problem 7. 

A few more words are in order about 
Problems 3 and 6, Problem 6 was 
solved by Variant C relatively rapidly, 
but the pattern discovered was 6b 
rather than 6a. With respect to 
Problem 3, we must simply say that 
the program of Variant C was dif- 
ferent from that of most of the human 
subjects. (It might be mentioned, 
however, that the fourth and sixth 
ranking in the group of 12 adult sub- 
jects also missed this problem.) The 
occurrence of a following b in Se- 
quence 3 led the program to attempt 
to use the relation of “next on the 
backward alphabet” instead of de- 
scribing the pattern in terms of the 


HUMAN ACQUISITION oF CONCEPTS FOR SEQUENTIAL PATTERNS 


circular list (a, b). It did not do 
enough checking to discover and cor- 
rect its error. The majority of the 
human subjects either did not make 
that error, or were able to correct it. 

In the third column of Table 4 we 
have recorded the times spent by 
Variant D on each of the 15 problems. 
There is a modest positive correla- 
tion between the times taken by the 
program and the subjects (Column 1), 
but the agreement cannot be claimed 
to be close. Analysis suggests that 
the time required by the program de- 
pended much more on length of period 
than did the time required by the 
human subjects. If we consider only 
the nine patterns of Period 3, the cor- 
relation of times is very much im- 
proved. Since the theory does not 
postulate that the relative times re- 
quired for the several elementary proc- 
esses will be the same for the com- 
puter as for human subjects, there is 
no real justification for comparing 
human with computer times between 
tasks that have quite different “mixes” 
of the elementary processes. 

Among the patterns of Period 3, 
Pattern 2 took the human subjects a 
very short time, but the program a 
rather long time. We would conjec- 
ture that in this case, the program was 
slow because it lacked a concept that 
most of the subjects had—the con- 
cept of repeating a symbol a fixed 
number of times (see Tables 1 and 2). 
Thus, while the program discovered 
Pattern 2a, we believe that most sub- 
jects represented the pattern in a 
manner more nearly resembling 2b. 

We have mentioned these details 
because they illustrate how a theory 
of the sort we have proposed permits 
one to examine the microstructure of 
the data, and to develop quite specific 
hypotheses about the processes that 
human subjects use in performing 
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these tasks. Of course, to test these 
hypotheses we shall require additional 
observations, particularly observations 
like those we have reported on prob- 
lem solving tasks, which record not 
simply the success or failure of the 
subject, but as much detail as can be 
detected of behavior during the prob- 
lem solving process. The possibility 
of confronting the theory with such 
detail greatly facilitates its testing and 
improvement. 
CONCLUSION 

In this paper we have set forth a 
theory, comprising a language for 
pattern description and a program, to 
explain the processes used by human 
subjects in performing the Thurstone 
Letter Series Completion task. We 
have devised measures of problem 
difficulty based on the pattern descrip- 
tions and upon the ability of variants 
of the program to solve particular 
problems. These measures of problem 
difficulty correlate well with measures 
derived from the behavior of the hu- 
man subjects. By analysis of the 
pattern descriptions and programs, we 
have been able to form, and partially 
test, some hypotheses as to the main 
sources of problem difficulty. By de- 
tailed comparison with the human 
behavior, we have formed some con- 
jectures about the detail of processes 
that can be subjected to additional 
tests in the further development of 
the theory. 

We conclude on the basis of the 
evidence presented here that the 
theory provides a tenable explanation 
for the main pattern forming and 
pattern extrapolating processes in- 
volved in the performance of the letter 
series completion task. Different 
variants of the theory can be used to 
account for individual differences 
among human subjects in performing 
this task. 
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A RESONANCE THEORY OF “MICROVIBRATIONS"”! 


JAMES G. L. WILLIAMS 
Nebraska Psychiatric Institute, University of Nebraska 


A reanalysis of Rohracher's original data shows that the periodic micro- 
vibration of the body surface recorded by him does not require the 
assumption that there exists any physiological event occurring at that 
frequency. A “resonance” theory concerning the origin of micro- 
vibrations is proposed and this suggests that the frequency is deter- 
mined by the physical characteristics of the particular body-transducer 
system used for its recording. The microvibration amplitude is shown 
to be a sensitive psychophysiological measure of muscle tension and 


gross bodily activity. 
determination of tremor frequenci 


Rohracher (1946, 1949, 1952, 1954, 
1955, 1958a, 1958b, 1959a, 1959b, 
1960) has claimed that the entire sur- 
face of the human or homothermic 
animal body exhibits minute con- 
tinuous vibrations. During the past 
several years a number of workers 
(Denier, 1957; Heller-Jahnel, 1959; 
Luhan, 1953; Nirrko, 1961; Sugano, 
1957; Swarofsky, 1958) have given 
support to Rohracher’s (1954) con- 
tention that, 


with suitable apparatus a system of contin- 
uous microscopic vibrations can be demon- 
strated in the human and animal body. Ina 
healthy human being in the condition of 
greatest relaxation its magnitude is 1-5 and 
its frequency 6-12 vibrations per second . . . 
we are not concerned with electrical processes 
but with a microscopically small rhythmic 
vibration system of the organism [p. 1]. 


Rohracher (1954) has suggested 
that almost any device which will 
convert mechanical movements into 
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study, to R. W. Russell for his generous 
advice and helpful criticism, and to Barbara 
Williams for her valuable technical assistance. 


The implications of the theory with regard to the 


es are discussed. 


electrical impulses together with an 
amplifier and recorder having suitable 
frequency responses (3-30 cycles per 
second) can be used to record the 
microvibrations. Apparently he has 
successfully used a wide variety of 
makeshift transducers adapted, for 
example, from carbon microphones or 
phonograph pickups. However, for 
much of his work he has especially 
favored a commercially designed 
electrodynamic vibration transducer 
(Philips Type GM 5520) which, in 
use, he suspends from a pulley so that 
a counterpoise may be used to reduce 
its apparent weight and allow the 
probe to rest on the subject's limb, 
etc. with a constant pressure. He 
scores the resulting alpha-like wave- 
form by measuring its double ampli- 
tude and counting the number of 
apparent peaks occurring during a 
given interval. 

Figure 1 reproduces the waveform 
obtained by recording the amplified 
output from a counterbalanced trans- 
ducer (Acos VP2) resting on the 
subject’s forearm, the general experi- 
mental arrangement being similar to 
that used by Rohracher. 

Figure 2 shows our development of 
Rohracher’s original instrumentation : 
In addition to recording the simple 
raw waveform (Recorder 2) we have 
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Fic. 1. Microvibration waveform obtained by recording the amplified 
output from a counterbalanced transducer. 


added an amplitude-integrated record 
(Recorder 1) and a frequency analyser 
record (Recorder 3). The use of these 
additional recordings is discussed be- 
low in the section concerning the ex- 
perimental validation of the resonance 
hypothesis. 


TRANSDUCER 


INTEGRATOR 
(AMPLITUDE) 


RECORDER | 


SUBJECT’S FOREARM 


RECORDER 2 


To the 1954 summary of his earlier 
publications Rohracher has added 
much additional material confirming 
and extending his original findings. 
While the very extensive work of 
Rohracher and his collaborators can- 
not adequately be summarized here, 


COUNTERPOISE 


FREQUENCY 
ANALYSER 


RECORDER 3 


~ 


Fic. 2. Diagram showing the general experimental arrangement used to record micro- 


vibrations. 


(The amplified output from the transducer is recorded directly by Recorder 2, 


after amplitude integration by Recorder 1, and after frequency analysis by Recorder 3.) 
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it is of present importance to restate 
some of their basic conclusions. The 
most fundamental of these has been 
quoted above. Others include the 
following: 


1. While the microvibrations can be 
detected throughout the body, their 
origin is in the striped musculature. 

2. Contraction of the striped mus- 
cles increases the amplitude many 
times but does not affect the fre- 
quency. 

3. Within any given recording the 
frequency is remarkably constant. 


Rohracher (1954) concluded that, 


the body vibrations fulfill two important 
biological functions with the expenditure of 
an extraordinarily small amount of energy: 
Keeping the body of the warm-blooded animal 
at a constant temperature and keeping the 
musculature in a constant state of readiness, 
thereby making possible rapid and positive 
motor reactions [p. 19]. 


Both he and others (Nirrko, 1961) 
have been impressed by the similarity 
of the EEG alpha and microvibration 
waveforms and frequencies and have 
implied the existence of a causal con- 
nection, Kennedy (1959) has sug- 
gested that alpha rhythm may arise 
from 

mechanical oscillation of the gel of the living 


brain, not necessarily from synchronization 
of neural activity directly [p. 352]. 


This viewpoint has been strongly 
criticized by Oswald (1961) and 
Rosner (1961). 

The purposes of the present paper 
are to suggest: 


1. An alternative hypothesis to 
account for the occurrence of the 
microvibration phenomenon. 

2. That the microvibration fre- 
quency is determined largely by the 
physical characteristics of the par- 
ticular transducer system used (and 
can therefore have no direct relation 
to the alpha rhythm). 
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3. That the microvibration am- 
plitude varies according to the amount 
of energy being imparted to the 
transducer and has value as a sensitive 
psychophysiological measure of the 
degree of muscle tension. 


A RE-EVALUATION OF ROHRACHER’S 
RESULTS 


While it is clear that Rohracher has 
arrived at his final conclusions only 
after a careful and detailed considera- 
tion of his very extensive experimental 
work, it is unfortunate that, for the 
most part, his approach has been such 
as to give qualitative results rather 
than to provide more than a minimum 
of quantitative data. (It may be that 
Rohracher recognizes this when he 
chooses to state his final conclusions, 
not as such, but as “hypotheses.’’) 
For this reason, it may be that the 
most economical method of appraisal 
would be first to examine the experi- 
mental evidence for his basic conten- 
tion that the striped muscles are the 
source of microvibrations having a 
frequency of 6-12 vibrations per 
second and an amplitude of 1-54, 
rather than by attempting a critical 
survey of the very many contributory 
experiments on which his findings are 
based. 

Of the two dependent variables 
which he has investigated—amplitude 
and frequency—his findings about 
frequency are of central importance to 
his theories. That muscles exhibit 
movements of various amplitudes is a 
truism; that they show continuous 
periodic movements of a constant 
frequency is a most important claim 
with far-reaching implications. Now, 
it seems that there are three out- 
standing facts about the frequency: 
(a) It always lies within the approxi- 
mate range of 6-12 cycles per second ; 
(b) within any one record it is strik- 
ingly constant; (c) the variation 
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between records—i.e., between tests 
given on different occasions, even 
when these are all taken from the 
same subject—is apparently random 
and not correlated with any other 
variable investigated by Rohracher. 

In his attempt to demonstrate a 
physiological correlate of the fre- 
quency Rohracher makes use of only 
the first two of these observations, 
arguing that they find a counterpart 
in several authoritative experimental 
findings about muscle action po- 
tentials. The third he dismisses by 
assuming that the experimental situa- 
tion on each occasion is identical and 
suggesting that becatse the two 
series of consecutive daily records 
taken from each of two subjects were 
not correlated, the variation must be 
due to physiological (and not en- 
vironmental) causes. In fact, he 
never takes a sufficiently lengthy 
single record or long enough series of 
briefly spaced records to relate the 
second and third of the above ob- 
servations. He does not formally 
investigate the reliability of his meas- 
ures and none of his data can be 
reanalyzed to provide an indication 
of this. 

If we reject Rohracher’s assumption 
of reliability it is clear that there are 
at least two possible sources of the 
variability in addition to the one he 
suggests: (a) differences in the precise 
location of the transducer for each 
record; (b) differences associated with 
the transducer assembly itself. 

Now, these differences would prob- 
ably reveal themselves most clearly 
if, in the first case, a number of 
records were taken from each of 
several very different sites and, in the 
second case, a series of records from 
each of two different transducers 
were compared. Nothing of this kind 
has been done in any systematic way: 
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Nevertheless, in his 1949 monograph, 
Rohracher does provide us with 
nearly a hundred reproductions of 
actual records. An examination of 
these shows us that the two trans- 
ducers most frequently used are the 
electrodynamic and the piezoelectric 
ones. Moreover, there are several 
records taken with the one transducer 
which can be matched (for location) 
with records taken with the other (see 
Table 1). 

Obviously, there are severe limita- 
tions to this kind of retrospective 
analysis. For example, as the descrip- 
tion of each record is not always 
definitive with regard to either the 
subject or the transducer used the 
assumption that the scores are un- 
correlated may not, therefore, be 
altogether justified. In spite of these 


TABLE 1 


COMPARISON OF MICROVIBRATION FREQUEN- 
CIES OBTAINED FROM Two DIFFERENT 
TRANSDUCERS—ONE CRYSTAL AND 
ONE ELECTRODYNAMIC 


Crystal Electrodynamic 
transducer transducer 
Location 
Fi Figure | f; ay | Figure 
enuen num- zeden num- 
Forearm 7.0 16c 10.0 9b 
7.0 23b 9.5 18a 
7.0 25a 10.0 34a 
8.0 26b 13.0 34b 
12.0 36b 
Forearm 7.5 25b | 10.5 24a 
(tensed)| 8.5 26a 9.5 24b 
Thigh 7.0 16b | 11.0 19b 
6.5 23a 9.0 36c 
M 7.36 10.50 


t = 49.45* 


Note.—The frequencies were estimated from repro, 
ductions of records included in Rohracher's 1949 
monograph, 


*p <.001. 
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difficulties, however, the table does 
tend to support the hypothesis that 
the frequency is dependent on the 
transducer assembly used: The dis- 
tributions of frequencies associated 
with each of the two transducers do 
not overlap; the location of the trans- 
ducer has no obvious effect. 

If this analysis is correct, then 
insofar as we are concerned with fre- 
quency we must be dealing with 
individual differences between two 
physical systems rather than between 
two physiological ones (as Rohracher 
supposed) and it will be necessary to 
examine such a system in detail. 


RESONANCE HYPOTHESIS 


Figure 3 is a schematic representa- 
tion of a transducer in contact with a 
vibrating object. The vibrating part 
of the transducer (mass = mı) is 
coupled to the stationary part through 
a spring (spring constant = kı) and 
the system is necessarily a damped one 
(damping factor = rı). (In absolute 
vibration transducers the stationary 
part is the seismic mass; in relative 
transducers it is the housing.) It is 
the movement of the mass, m,—fol- 
lowing precisely the movement of the 
object—which gives rise to electrical 
changes which can be amplified and 
recorded in the ways described by 
Rohracher. If the object were, say, 
a muscle unit, the transducer would 
enable its contractions to be recorded 
accurately, providing that the magni- 
tude of the constants m1, ki, and rı 
were very small compared with those 
of the constants m, k, and r of the 
muscle unit. If the constants relating 
to the transducer were comparatively 
large, however, a serious reactive error 
would occur (van Santen, 1953)—a 
difficulty apparently not recognized 
by Rohracher. 

Further, if an isolated transducer is 
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Fic. 3. Schematic representation of a trans- 
ducer in contact with a vibrating object. 


given a single abrupt movement, the 
movable part of the transducer will be 
caused to oscillate with a diminishing 
amplitude. The initial amplitude of 
the oscillation will be determined by 
the amount of energy involved; and 
the frequency will be determined by 
the values of all three constants, mı, 
ky, and r}. If a second movement is 
imparted to the transducer before the 
oscillation due to the first has com- 
pletely decayed and then this is 
followed by a third and so on, the 
original oscillation will be maintained 
with a varying amplitude, the value of 
which, at any given moment, is a 
measure of the kinetic energy of the 
system. It follows from this that the 
total area of the recorded curve is 
proportional to the amount of energy 
which has been fed into the system. 
Now, this natural or resonant fre- 
quency is of some importance (e.g., 
the amplitude response is usually 
linear only at frequencies higher than 
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the natural frequency). That for the 
Philips’ electrodynamic transducer is 
quoted by the makers as being 12 + 2 
cycles per second—which is appreci- 
ably greater than the mean range of 
frequencies of body vibrations re- 
corded by Rohracher. Superficially, 
this would make it appear that the 
microvibration waveform could not 
be attributed to any resonance phe- 
nomenon associated with the trans- 
ducer but that it may have a true 
physiological correlate. However, if 
we now refer again to Figure 3 and 
imagine a single movement being im- 
parted in this instance to the object 
it can be seen that again the natural 
frequency of the system will be re- 
corded, but in this case it will be the 
natural frequency of the entire system 
which is arrived at by considering m, 
k, and r as well as mı, kı, and rı and 
this frequency will almost certainly 
differ from that of the transducer 
alone. More specifically: when a 
transducer is placed on the surface 
of the body, the skin undergoes an 
elastic deformation which introduces 
the effect of a spring; that part of the 
body undergoing deformation has a 
certain mass and damping effect asso- 
ciated with it; the entire physical 
system, including the transducer, now 
has a natural frequency determined by 
the physical characteristics of both 
the body and the transducer. If a 
series of aperiodic impulses are im- 
parted to the system a recording from 
the transducer will show a waveform 
having a frequency which is that of 
the combined system and an ampli- 
tude which varies according to the 
occurrence and magnitude of the 
aperiodic impulses. 

It would seem to be reasonable to 
suppose that the higher the natural 
frequency of the transducer, the 
higher would be the natural frequency 
of the combined body-transducer sys- 
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tem. Figure 8 in Rohracher’s 1949 
monograph is of a record taken by 
means of a “loudspeaker system with 
a high natural resonance.” The fre- 
quency appears to be about 15 cycles 
per second. 

Partly in an attempt to validate his 
theory Rohracher refers to some 
experiments which do not make use 
of electromechanical transducers to 
record the vibrations and which he 
regards as confirmation of the exist- 
ence of microvibrations because of the 
dissimilarity of the pickup devices. 
Typical of these experiments is that 
carried out by Marko (reported in 
Rohracher, 1952) in which a beam of 
light passed through a prism placed 
on the body surface was used as a 
means of recording the vibrations. 
Again referring to Figure 3, a reso- 
nance theory would suggest that the 
technique can be analyzed as before: 
the natural frequency is here de- 
termined solely by the constants 
mı, m, k, and r. 

The point has now been reached 
where it appears that the present hy- 
pothesis concerning the origin of the 
microvibrations may be an acceptable 
and economical alternative to the 
explanation put forward by Roh- 
racher. The completion of the ar- 
gument for the acceptance of the 
resonance theory necessitates some 
references to the experimental testing 
of its validity. 


TESTING THE RESONANCE 
HYPOTHESIS 


In the absence of damping forces 
the free vibration of a simple me- 
chanical system having one degree 0 
freedom is sinusoidal (Manley, 1945). 
However, in practice it is usually 
found that a mass may be capable of 
vibrating in different directions simul- 
taneously. For example, a cube suit- 
ably suspended on springs could be 
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capable of assuming six different 
natural frequencies: three linear along 
each axis and three torsional about 
each axis. When a vibrating system 
possesses more than one degree of 
freedom, one vibration is likely to 
influence the others: a phenomenon 
known as “coupling.” The use of a 
transducer in such a situation would 
result in the recording of a complex 
waveform and it follows that this 
waveform would be the sum (in the 
direction determined by the placing 
of the transducer) of the various 
simple sinusoidal components asso- 
ciated with each degree of freedom. 
It is extremely important to note 
that the final complex waveform does 
not obviously display the number of 
component oscillations of which it is 
the sum, or the frequency or amplitude 
of those components. For example, 
in the well-known phenomenon of 
“beating” which occurs when two 
sine waves, the ratio of the frequencies 
of which is nearly unity, are added 
together, the resultant wave has the 
same apparent frequency as the com- 
ponent with the greater amplitude 
and its amplitude varies between the 
sum and difference of the component 
amplitudes, the beat frequency being 
the difference between the frequencies 
of these components. It can be seen 
that if this occurrence were trans- 
ferred to the Rohracher phenomenon 
it would not be correct, for example, 
to assume the existence of a physio- 
logical correlate of the beat frequency 
nor the nonexistence of any periodic 
event other than that indicated by the 
dominating frequency of the major 
component. The interpretation of 
the amplitude of the complex wave- 
form presents similar difficulties. 
Moreover, the example given is that of 
one of the simplest situations; the 
level of complexity of the micro- 
vibration waveform investigated by 
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Rohracher precludes even more the 
acceptance of too facile an inter- 
pretation. 

It follows from the above that an 
appropriate experimental validation 
of the resonance hypothesis would 
consist of analyzing the microvibra- 
tion waveform and showing that its 
periodic components related only to 
the natural frequencies of the body- 
transducer system. 

It must here be remarked that 
while, in its simplest form, Fourier’s 
Theorem states that any periodic 
variation fulfilling certain conditions 
regarding continuity can be consid- 
ered as the sum of a number of 
sinusoidal variations whose periods 
exhibit a simple relationship, and 
Riemann’s Theorem states that for 
any given variation the equivalent 
series of sinusoidal variations is 
unique, Manley (1945) points out 
that this does not mean that the orig- 
inal variation must necessarily be re- 
garded as such a series. There are, in 
fact, a large class of functions (the 
orthogonal functions) which includes 
the sine function, and any member 
of this class could equally well be 
used as the basis of an alternative 
series. In some circumstances the 
choice of sine waves as the basic 
components of the complex waveform 
may be difficult to justify other than 
on the grounds of expediency. Since 
Fourier’s presentation of his first 
paper in 1807 the assumption has, 
from time to time, been queried. 
However, the fact that a simple un- 
damped vibration can be shown to be 
sinusoidal would appear to make the 
choice a reasonable one in the present 
context. 

Because of certain difficulties intro- 
duced by the fact that the micro- 
vibration waveform does not have a 
precisely regular periodicity (Wil- 
liams, 1956), it was decided that 
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although in some circumstances a 
Fourier analysis might be found to be 
an appropriately exact and elegant 
method, here it could give only ap- 
proximate and unreliable results, and 
this would not justify the lengthy and 
elaborate computations involved. In- 
stead, the familiar automatic fre- 
quency analyzer expressly designed 
for the analysis of EEG waveforms 
was judged to meet the present 
requirements very fully. This in- 
strument performed a complete fre- 
quency analysis every ten seconds and 
presented it in written form on that 
part of the microvibration record to 
which it referred. 

In an experimental arrangement 
resembling that described by Roh- 
racher, eight psychiatric patients pro- 
vided 2-minute records from each of 
three transducer systems, the trans- 
ducer probes being applied in random 
order to the same point near the 
middle of the left forearm extensor 
muscle. The wave analyzer record- 
ings showed unimodal frequency dis- 
tributions, the modes occurring at 
different frequencies for each trans- 
ducer. As the ranges of the distribu- 
tions could be accounted for by the 
overlap in selectivity (44%) between 
adjacent channels of the analyzer it 
was possible to conclude that in each 
case there was only one dominant 
frequency maintained and modulated 
by aperiodic impulses, or periodic 
impulses outside the analyzer fre- 
quency band. 

Having now established the exist- 
ence of a single periodic frequency 
within the microvibration frequency 
range it remains only to show that 
different transducers having different 
natural frequencies significantly differ 
in the frequencies of the microvibra- 
tion records obtained from them in 
order to confirm the “resonance” 
hypothesis and reject Rohracher’s 


assertion that there is a periodic 
physiological correlate of the micro- 
vibration frequency. 

Two of the three transducer systems 
which had already been shown to have 
unimodal frequency responses were 
used. These differed only in the 
method of mounting the transducer 
(Acos Type VP2) and were chosen on 
the basis that a noncounterbalanced 
one would have a larger apparent 
weight than a counterbalanced one, 
and, following the previous arguments 
related to Figure 3, would have a 
higher natural frequency. At the 
same time other relevant variables 
would be kept constant. 

Two groups, each of 16 psychiatric 
patients, matched for age, sex, and 
diagnosis, were compared : 


Mean Standard 
Frequency Deviation 


Noncounterbalanced 12.1 1.00 
transducer 
Counterbalanced US 1.06 
transducer 
t=12.8 p < 0.001 (one-tailed test) 


This result accords well with the 
previous reanalysis of Rohracher’s 
data. (See Table 1.) 

The obvious experiment of taking 
simultaneous records from two differ- 
ent transducers placed near to each 
other on the subject’s arm was found 
to give rise to a curious and misleading 
result. The records obtained, al- 
though not identical, were strikingly 
similar in form and frequency, the 
similarity becoming less marked when 
the transducers were placed further 
apart. The arm must be responsible 
for supplying the common factor. 
The fact that this result does not lend 
support to Rohracher’s hypothesis, 
however, can be demonstrated by the 
simple expedient of removing either of 
the transducers when the other one 
will immediately provide a record at 
the expected frequency. The two 
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transducers, together with the sub- 
ject’s arm, form a single system, the 
degree of coupling between the trans- 
ducers decreasing as the distance be- 
tween them is increased. 


Sources OF ENERGY OF 
MICROVIBRATIONS 


Because the microvibrations are due 
to a physical system being caused to 
resonate, any mechanical disturbance 
transmitted to the subject results in 
an immediately increased amplitude 
which then decays exponentially to 
its original level. However, in the 
experimental arrangement described 
here, large and frequent impulses are 
required in order markedly and con- 
tinuously to increase the amplitude 
and it is clear that very much the 
major contribution must be made 
by the subject. 

Again, because any mechanical 
impulse will increase the amplitude 
there may be contributions from 
sources other than the skeletal mus- 
cles. Rohracher has, in fact, pointed 
out that placing a transducer directly 
over a blood vessel may give rise to 
modulation by the pulse. A bal- 
listocardiographic effect sometimes 
occurs; certainly gross limb move- 
ments and respiration increase the 
amplitude. 

The most interesting source of 
energy, however, appears to be the 
tensed muscle itself. Figure 4 shows 
two simultaneous amplitude-inte- 
grated records (using simple RC 
integration), the upper from electro- 
myograph electrodes placed over the 
left forearm extensor muscle, and the 
lower from the counterbalanced crys- 
tal transducer, the probe of which 
rested between the EMG electrodes. 
The subject gripped and released a 
rubber bulb which was connected | to 
a mercury manometer, thereby being 
enabled more easily to maintain a 


constant level of tension. The two 
records are very similar indeed, the 
main difference being at the point of 
relaxation. Here there is a sudden 
increase in the amplitude of the 
microvibration record which is not 
paralleled by that of the EMG. From 
what has gone before it will be seen 
that this is what would be expected 
on the resonance theory—any me- 
chanical disturbance, including a sud- 
den relaxation, will increase the ampli- 
tude of the microvibrations, (Twelve 
microvibration records taken during 
changes in the level of tension as 
indicated by the EMG or mercury 
manometer showed that on each of the 
36 occasions when there was an in- 
crease in tension there was also an 
increase in the microvibration ampli- 
tude. On each of the 12 occasions 
when there was a decrease in tension 
there was a decrease in the micro- 
vibration amplitude provided that 
the immediate increase in the ampli- 
tude due to the sudden relaxation 
was disregarded, the duration of this 
increase in part being determined by 
the degree of damping of the system 
and the time constant of the inte- 
grator.) 

Apart from the gross differences 
at the point of relaxation the tech- 
niques themselves differ in two im- 
portant ways: 


1. The amplification required to 
produce the microvibration record is 
less than one hundredth of that re- 
quired for the EMG record. In 
general the microvibration technique 
of measuring muscle tension displays 
the important advantage of requiring 
relatively simple apparatus. 

2. It has been shown how the pulse, 
respiration, and other remote dis- 
turbances affect the microvibration 
record. It follows that while muscle 
activity outside the relatively re- 
stricted field of the EMG surface elec- 
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trodes does not contribute to the EMG 
record, the microvibration record 
must be produced by the additive 
effect of many muscles—some quite 
remote from the transducer. The 
microvibration record probably ap- 
proximates to the summed output 
from many EMG surface electrodes 
in just the same way that the output 
of each surface electrode approximates 
to the sum of many locally sensitive 
neeedle electrodes. It has been shown 
elsewhere (Williams, 1956) how this 
capability makes it possible to dif- 
ferentiate between different psychi- 
atric groups at a high level of sig- 
nificance. 

The similarity of the integrated 
EMG and microvibration records is 
not altogether unexpected when one 
considers that any apparently steady 
isometric pull by a muscle is obtained 
by the summation of many muscle 
units, each in a different phase of 
activity (i.e., when one group is relax- 
ing, another is contracting, and so 
on), this circumstance having been 
brought about by the asynchronous 
discharge of the cells of the motor 


EMG 


MICROVIBRATION 
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neuron pool (Wright, 1961). It is 
also important to note that, in an 
isometric contraction, the contractile 
part of the muscle shortens and 
thickens while stretching the elastic 
tissue component. It follows that 
whereas the surface EMG reflects the 
electrical aspects of the asynchronous 
activity, the microvibration resonance 
can derive energy from the mechanical 
counterpart. As discussed earlier, 
gross movements will also contribute 
to the amplitude of the microvibration 
record; in fact, attaching a suitable 
seismic transducer to a subject enables 
his general activity level to be re- 
corded in a way related to that used 
to measure the activity of a small 
animal by suspending its cage from a 
spring and recording the subsequent 
oscillations. Figure 5 shows a record- 
ing obtained by amplifying the output 
from a small seismic transducer (M. 
B. Vibramite Vibration Pickup, Type 
11) fastened to the subject's waist. 
The first and third sections of the 
record show a pulse-modulated wave- 


form of small amplitude recorded 


when the subject was standing still; 
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Fic. 4. Comparison of amplitude-integrated microvibration and EMG records. 
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The microvibration waveform as a measure of general activity: Record of the 


output from a seismic transducer attached to the subject. 


the second section shows how the 
amplitude increased when the subject 
walked slowly forward. An especially 
important advantage of this tech- 
nique is that it requires only the 
addition of a small transmitter to 
enable the output from the transducer 
to be telemetered; the subject then 
has complete freedom of movement 
within the range of the transmitter 
and an easily quantifiable record of 
his level of activity over many hours 
can be taken. 

One further point arises from this 
analysis. When a limb or digit is 
extended and its position maintained 
only by the balanced contraction of 
agonist and antagonist, any momen- 
tary imbalance in the opposing groups 
of muscle units will result in a gross 
movement. Although these slight 
imbalances may occur erratically, 
‘Provided that they occur sufficiently 
frequently we have, according to the 
resonance theory, all the conditions 
necessary for the maintenance of a 
sustained vibration or tremor. The 
frequency of this tremor would depend 
on the physical characteristics of the 
limb rather than reflect the precise 
time of occurrence of the contractions 
of the muscle units. That this is so 
is supported by Hamoen’s (1958) 
experiments which suggested that the 
frequency of the tremor was cor- 
related with the mass and elasticity 
of the moving parts. 
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THE ASSUMPTION CONCERNING “WRONGS” IN RESTLE’S 
MODEL OF STRATEGIES IN CUE LEARNING* 


MARVIN LEVINE 


Indiana 


The assumption concerning the ro! 


University 


le of nonreinforcement in discrimina- 


tion learning in Restle’s (1962) recent model is examined. 2 experi- 
mental phenomena, the effects of random reinforcements upon sub- 
sequent learning, and the learning-set effect, are shown to be incom- 
patible with implications from the assumption. 


Restle (1962) has described a model of 
strategies in a cue or discrimination 
learning situation. The situation consists 
in the presentation of a series of stimuli 
with one of two responses made after 
each stimulus presentation, and an out- 
come indicating “right” or “wrong” after 
each response. The sequence stimulus- 
response outcome defines a trial. The 
model characterizes the behavior of the 
subject, both human and subhuman, as 
dependent upon a choice from a set of 
“strategies” (synonymous with the psy- 
chological usage of “hypotheses”—see 
Krechevsky, 1932; Levine, 1959). One 
or more of these strategies are correct, 
i.e., produce correct responding on every 
trial; the remainder are incorrect leading 
to errors on 50% or 100% of the trials. 
At the outset of the experiment the sub- 
ject chooses one (or more—an impor- 
tant, though here irrelevant, qualifica- 
tion) strategy at random from the set of 
strategies. The strategy selected de- 
termines the first response and is fol- 
lowed by the outcome for the first re- 
sponse. Restle now makes two assump- 
tions about the effects of outcomes 
following each trial: (a) If the response 
is positively reinforced (e.g. the experi- 
menter says “right”) the subject keeps 
the same strategy and responds on Trial 


1I am indebted to Frank Restle for mak- 
ing available a prepublication copy of his 
manuscript, for his helpful discussions of the 
ideas contained here, and for his encourage- 
ment of their publication. 


2 accordingly; (b) if the response is 
negatively reinforced (e.g., the experi- 
menter says “wrong”) the subject re- 
turns his strategy to the set and randomly 
resamples, Strategy sampling after a 
wrong in slightly more formal terms 
is with replacement. These, in general, 
are the effects of rights and wrongs 
on any trial: the right maintains the 
strategy, the wrong causes the subject 
to start over. It is this latter assumption, 
about the effects of wrong, which will 
be critically examined here. 

The chief feature of this assumption is 
that a wrong starts the strategy selec- 
tion procedure afresh. When a subject 
receives a wrong on any trial, he is 
placed back in the same state he was in 
at the outset of Trial 1. In Restle’s 
(1962) words, “If the subject makes an 
error he takes a new independent sample 
from H [i.e., the total set of strategies] 
and begins the process over [334].” 

Simultaneous with the formulation of 
this model data were being gathered 
which appear to be critical of this as- 
sumption. Levine (1962), out of con- 
cern for an entirely different theoretical 
treatment, investigated the effects of ran- 
dom reinforcements upon subsequent dis- 
crimination learning. Adult human sub- 
jects were presented with a sequence of 
two colored stimuli (e.g., blue or tan 
figures), were instructed to make one of 
two responses to each stimulus, and were 
told either “right” or “wrong” following 
each response. During the first phase, the 
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TRIALS TO CRITERION 


O 10 30 60 
RANDOM REINFORCEMENTS 


Fic. 1. The number of trials required to 
reach a criterion of 15 consecutive errorless 
responses in a discrimination learning prob- 
lem which followed varying amounts of 
random reinforcements (the filled and open 
circles represent data points from two dif- 
ferent experiments). 


random reinforcement phase, the experi- 
menter applied the reinforcing events on 
a prearranged schedule, i.e., the words 
right and wrong were uncorrelated 
with any S-R pairs. On the last trial 
of this phase the experimenter always 
said “wrong.” During the next phase, 
which appeared without any special an- 
nouncement or break, the traditional dis- 
crimination procedure was employed in 
which the word right was presented 
only when the correct response was made 
to the appropriate color. Otherwise the 
word wrong was presented. The ef- 
fect of zero random reinforcements (the 
traditional discrimination procedure 
alone) was compared, in two experi- 
ments, to discrimination learning follow- 
ing 4, 8, 10, 12, 30, and 60 random re- 
inforcements. 

The implication of the assumption 
above, that after any “wrong” resampling 
follows replacement, is that discrimina- 
tion learning with zero preliminary ran- 
dom reinforcements should be the same 
as discrimination learning following four 
or more random reinforcements. The 
last “wrong” of the random reinforce- 
ment phase places the subject back into 
the same state as the subject just starting 
at Trial 1. Figure 1 shows that many 
more trials are required to solve the 


problem by subjects having even as few 
as four random reinforcements. These 
data suggest that the subject, having 
tried a strategy (in this case, responding 
systematically to color) and having been 
told “wrong,” does not replace the strat- 
egy and start over but rather eliminates 
the strategy and samples from the re- 
maining set, i.e he samples without re- 
placement.2 This is borne out by evi- 
dence of a more informal sort. In an- 
other experiment the subjects who had 
not reached criterion after 30 discrimina- 
tion trials were interrupted and subse- 
quently told that all blue cards called for 
one response, all tan cards called for the 
other. Several of the subjects remarked 
that they had tried that at the beginning 
but that it hadn't worked. Note the im- 
plication that a strategy which hadn't 
worked was not tried again. The effect 
has been demonstrated when 0, 1 (shown 
in Figure 1), or 2 irrelevant dimensions 
have been present and under conditions 
to guarantee that the subject was always 
observing the color. 

Another phenomenon incompatible with 
the “wrongs” assumption in Restle’s 
model is the learning-set effect. In this 
experiment half the subjects are wrong 
on Trial 1. The assumption requires 
that at least half of this subgroup be 
wrong again on Trial 2 (since these sub- 
jects “begin the process over” at the 
outset of Trial 2). Of course, it is well 
known and substantiated (Harlow, 1959) 
that human and subhuman primates can 
show virtually perfect performance on 
Trial 2, regardless of whether Trial 1 
was right or wrong. The problem raised 
by this phenomenon for Restle’s wrongs 
assumption is more complex than that 
raised by the random reinforcement phe- 
nomenon, The latter could be handled 


2 Sampling without replacement is un- 
doubtedly still an oversimplified description 
of the process. Practically all the subjects 
overcome the effects of random reinforce- 
ments and learn the discrimination within 
100 trials. It would appear that the color 
strategy was not completely eliminated (with 
probability of resampling =0) but was rele- 


gated to a condition of lower probability of 
resampling. 
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simply by shifting to a process that does 
not involve simple strategy replacement, 
The learning-set effect suggests that some 
wrongs do not always cause the subject 
to reject a strategy but may lead him to 
an alternate form of that strategy. This 
is usually what is intended by those who 
describe the behavior pattern “lose-shift” 
(Behar, 1961; Goodnow & Pettigrew, 
1955) in which an error is a stimulus for 
a very specific behavior change. In this 
case the effect of “wrong” is not to pro- 
duce rejection of the strategy but rather 
to indicate which responses go with 
which stimuli, given that strategy. 

In summary, Restle’s model which con- 
tains some important features, notably 
the elegant and intuitively appealing as- 
sumption about the effects of “rights” 
and the invariance of the theorems over 
the number of strategies sampled, also 
contains an assumption whose implica- 
tions are widely discrepant with existing 
data. “Wrongs,” contrary to the assump- 
tion made, appear to reduce the re- 
sampling probability of the negatively 
reinforced strategy. “Wrongs” may also 
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function to guide behavior within a 
strategy. 
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COMMENTS ON “AN ANALYSIS OF GSR CONDITIONING’?! 


RUSSELL A. LOCKHART anD WILLIAM W. GRINGS 
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Critical comments are directed toward Stewart, Stern, Winokur, and 
Fredman’s (1961) analysis of GSR conditioning. Their proposed cri- 
terion for differentiating between a sensitized and conditioned GSR was 
found to be inadequate because of the high correlation between meas- 


ures of these phenomena (.90). 


A necessary control condition was not 


provided, making possible the assertion that the so-called “true CR” 


was actually a sensitized spontaneous response. 


It is concluded that 


the criterion may be useful in defining a conditioned response in long 
delay intervals, but the exclusiveness and universality of application of 
the criterion to the area of GSR conditioning is more limited than im- 


plied by Stewart et al. 


Current opinion as to the possibility 
of autonomic conditioning ranges from 
categorical denial (Smith, 1954) to belief 
that it is of prime importance in mediat- 
ing overt behavior (Mowrer, 1960). The 
issue is far from settled. A recent con- 
tribution to the problem area (Stewart, 
Stern, Winokur, & Fredman, 1961) pur- 
ported to review representative GSR 
conditioning studies in an attempt to 
show that “this work has probably dealt 
only with sensitization, and not with 
true conditioning.” Stewart and his 
associates do not deny the possibility of 
GSR conditioning, but argue that “true” 
conditioned GSRs can be defined only if 
criteria based on latency of response 
are imposed. 

By using a long (7.5 seconds) inter- 
stimulus interval, they proposed to 
differentiate sensitized GSRs from condi- 
tioned GSRs on the basis of a “second” 
response, found to occur during the long 
delay period prior to the UCR and 
subsequent to the “first” or orienting 
GSR to the CS. This second response 
was defined as any response during the 
conditioning trials having a latency 
greater than the range of latencies of 
orienting GSRs obtained to the CS 
during an adaptation series. Since the 
frequency of these second responses was 
greater during acquisition (77) than 
during adaptation (13), the authors 


1 Preparation of this report was supported 
in part by a grant to the second author from 
the National Institute of Health (M-3916). 


concluded that true conditioning of the 
GSR had been demonstrated. 

However, demonstrating a simple in- 
crease in the frequency of some response 
from a prepairing to a postpairing series 
is an incomplete definition of condition- 
ing. Any firm inferences concerning 
conditioning should be made from dif- 
ferences in the performance of a group 
with CS and UCS paired and another 
group in which the stimuli are unpaired; 
or, from the difference between respond- 
ing to a reinforced and nonreinforced 
stimulus in the same group. Such 
control conditions allow for a more or 
less direct measurement of the degree 
of response increase attributable to 
sensitization (i.e. the increase in re- 
sponding to the nonreinforced stimulus) 
and form the basis for statistical evalua- 
tion of pairing versus nonpairing modi- 
fication. 

Stewart et al. (1961) provided neither 
control condition. No mention was 
made of the fact that frequency of the 
so-called orienting response also in- 
creased from adaptation (167) to acquisi- 
tion (242). There was no statistical 
evaluation of either the increase in 
orienting GSRs or in true conditioned 
GSRs. One wonders why 13 true CRs 
were obtained during the adaptation 
period prior to stimulus pairing. The 
authors believed these to have been 
“random” responses. Yet, what they 
do not acknowledge is that random 
responses will also occur during the 
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acquisition period and with a greater 
frequency as a result of sensitization 
when shock is administered. 

If the reader will examine the data in 
question, he will note that the number 
of true CRs for subjects 5, 7, 13, and 17 
totals 41, or 53.2% of the total number 
of true CRs obtained from the entire 
sample of 19 subjects. In addition, 
these 4 subjects had the highest level 
of “orienting” GSRs. The other 6 
subjects (in the published data) gave 
only 13 CRs, and the 9 subjects in the 
unpublished data gave a total of only 
23 CRs during the entire conditioned 
period. If the above 4 subjects are 
excluded, the remaining subjects had 
an average of only 2.4 CRs in the series 
of 20 acquisition trials. 

If the rate of spontaneous error re- 
sponding (GSRs which meet CR criteria 
on a chance basis) were only 10% (and 
there is evidence to show that under 
strict criteria it may reach 20% [Stewart, 
1954], then one would expect at least 
two responses during the 20 conditioning 
trials to be erroneously scored as CRs. 
This observation, coupled with the 
probable sensitization accompanying 
shock administration, makes it reason- 
able to assume that the true conditioned 
GSRs observed by Stewart et al. (1961) 
may have been sensitized error responses. 

Stewart et al. (1961) offer their latency 
criterion as a method for distinguishing 
between a sensitized and conditioned 
response. In their study, frequency of 
the first response was employed as a 
measure of sensitization and frequency 
of the second response as a measure of 
conditioning. If the latency criterion is 
an adequate means of distinguishing 
between responses modified by sensitiza- 
tion and conditioning, then one would 
expect little correlation between meas- 
ures of these phenomena. However, if 
one examines the data (Figure 1, p. 64) 
a correlation (rho) of .90 is obtained 
between the frequency of first and second 
responses. Such a correlation cannot be 
regarded as evidence for the validity of a 
method designed to distinguish between 
sensitized and conditioned GSRs. 

While Stewart et al. (1961) limited 
their comments to results obtained in the 
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simple conditioning paradigm, it should 
be pointed out that the discrimination 
paradigm may provide a quite suitable 
method for differentiating between sen- 
sitization and conditioning effects. The 
magnitude of response to the nonrein- 
forced stimulus is an estimate of sensiti- 
zation effects, while the magnitude of 
response to the reinforced stimulus 
contains both sensitization and condi- 
tioning components. The difference 
between the two response magnitudes, 
then, may be taken as a measure of 
conditioning independent of sensitiza- 
tion. When the magnitude of response 
to a stimulus which has been paired with 
shock exceeds that to an unpaired stimu- 
lus, the explanation of the divergence 
lies more appropriately in the fact of 
stimulus pairing than in the generalized 
effects of sensitization. Indeed, because 
differential responding can be demon- 
strated in terms of the magnitude of the 
first response, the Stewart et al. criterion 
cannot be considered a necessary com- 
ponent of the definition of a conditioned 
GSR. What Stewart et al. may have 
shown is the possible conditionability of 
a particular response occurring in a long 
delay interval. However, with appro- 
priate controls it can be shown that 
other responses—namely the first re- 
sponse—can also be conditioned.? 
Moreover, if the latency criterion 
proposed by Stewart et al. (1961) were 
accepted as the only definition of a 
conditioned GSR, its applicability in 
general conditioning situations would be 


2 Subsequent to the writing of these com- 
ments, a study (Leonard & Winokur, 1963) 
appeared which applied the Stewart et al, 
(1961) latency criterion in the discrimination 
paradigm. No difference in the frequency of 
the first response to reinforced and non- 
reinforced stimuli was found, while the fre- 
quency of the second response was signifi- 
cantly greater on reinforced than on nonrein- 
forced stimuli. Such a result indicates that 
the second response is a reliable phenomenon 
and may be employed as one indication of con- 
ditioning in the long interval situation. It 
should be pointed out, however, that similar 
differences have been demonstrated for the 
first response especially when magnitude of 
response is used rather than frequency (e.g. 
Grings, Lockhart, & Dameron, 1962). 
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severely limited by the fact that it vir- 
tually requires observation of GSR 
conditioning only under long CS-UCS 
interval conditions. It would be impos- 
sible, for instance, to apply the criterion 
to GSR results obtained under a short 
CS-UCS interval (e.g., .5 second). That 
GSR conditioning occurs under a short 
CS-UCS interval has been shown by 
use of the discrimination paradigm. 

In summary, Stewart et al. (1961) 
have addressed themselves to an im- 
portant problem and have suggested a 
meaningful criterion for defining a CR 
in the long delay conditioning of GSR. 
It is concluded, however, that retesting 
of their hypothesis is needed with more 
adequate controls and, further, that the 
exclusiveness and universality of appli- 
cation of the criterion to the area of 
GSR conditioning is not as great as was 
implied in the original article. 
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RTT PARADIGM: 


NO PANACEA FOR THEORIES OF ASSOCIATIVE LEARNING! 
ROBERT J. SEIDEL? 


Human Resources Research Office, George Washington University 


The problem of discriminating between the all-or-none and incremental 
views of learning can be attacked from at least 2 aspects: empirical and 
conceptual. Given a set of experimental conditions for learning, the 
empirical function of changing response probabilities can easily be 
plotted. However, if one attempts to interpret theoretically toward 
a more general, abstractive concept of associative learning, then many 
other factors bearing on theory construction must be considered (e.g., 
the learning-performance distinction and operationism). Experiments 
were reported in an extended miniature paradigm, RTT, in order to 
illustrate the methodological problems involved in answering the 
empirical question. Much more work is indicated prior to drawing 
any conclusion regarding the conceptual nature of an associative learning 


function. 


Estes (1961) has pointed out that cur- 
rent linear models and his own earlier 
stimulus sampling model of learning 
(1950) are inadequate to handle the 
data obtained to date in the RTT para- 
digm. He suggests only an all-or-none 
view of associative learning seems to fit 
the findings. However, Jones (1962) has 
recently questioned the sufficiency of this 
miniature paradigm for studying the 
learning process, presenting data from 
an extended paradigm, RTTTT, to sup- 
port her position. 

The present paper discusses some of 
the conceptual and methodological ques- 
tions which must be dealt with before 
theoretical issues within this paradigm 
can be clarified. 


Sufficiency of the R,TT Paradigm 


The adequacy of one reinforcement 
prior to T,T» for discriminating between 


1 Preparation of this paper was supported 
by Grants M-3994 and M-5844 from the 
United States National Institute of Health, 
Department of Health, Education and Wel- 
fare. The author is indebted to Eugene A. 
Cogan for his helpful suggestions during the 
preparation of this manuscript. 

2 The experiments reported here were con- 
ducted while the author was at Denison 
University. The data were first reported at 
the 1961 meeting of the Psychonomics So- 
ciety at Columbia University. 


all-or-none and incremental theoretical 
views may be considered in two facets: 
the empirical, and the conceptual. The 
empirical issue is quite straightforward. 
Since the size of an increment should de- 
pend on the experimental conditions 
(exposure time and field, stimulus and re- 
sponse complexity, number of reinforce- 
ments, etc.), the number of Rs in Estes’ 
experiment may have been insufficient to 
demonstrate N :C shifting. 

If increasing Rs prior to TT produced 
a consistently increasing N:C switching, 
such evidence would favor incremental 
theory, although the specific shape of 
the function might not be describable a 
priori. Thatis, since increases in R would 
bring more correct associations closer to 
a reaction threshold this latent “habit 
strength” would be more readily evi- 
denced on Tz. 

Of course, this argument presumes 
items of equivalent difficulty. It is con- 
ceivable, however, that easy items could 
contribute greatly to N:C switching after 
one or two reinforcements. Yet, shortly 
thereafter, they might be given correctly 
on T, and thus eliminated from the N, 
category. If concomitantly the difficult 
items had not yet been brought close to 
threshold, N:C switching could show a 
temporary decrease. Later in learning 
when these difficult items increase suf- 
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TABLE 1 


MEAN PERCENTAGE SWITCHES (TiT:) 
or Items Nı:Cz RELATED TO NUMBER 
OF REINFORCEMENTS (R) PRIOR 


TO TESTING 
E: E: E; Es Es 
Percent | 16.9 | 36 | 44.7 | 43.5 | 54.8 


n 25 23 | 20 18 12 


ficiently in strength, the N:C switching 
should spurt up again. An all-or-none 
view, on the other hand, would be re- 
stricted to predicting random fluctua- 
tions of N:C switching around a chance 
level—regardless of whether an easy or 
difficult item pool were being learned. 
In addition to increasing R, employing 
different dependent measures (recogni- 
tion and savings) would allow determina- 
tion of the generality of the empirical, 
response-probability function obtained 
with a recall measure. 

Two experiments were conducted in 
the R,...R,TT paradigm to pro- 
vide some experimental answers (in a 
recall setting) to the empirical questions: 
(a) do Ni:Ce verbal shifts increase with 
increases in reinforcement and (b) do the 
correct verbal responses become more 
stable; i.e do Cı:Cə responses in- 
crease as reinforcements increase prior to 
testing? 

The first experiment extended the 
RTT paradigm to R, through R;TT, 
duplicating the intratrial exposure con- 
ditions of Estes except that guessing was 
not forced. The learning materials were 
10 syllable-digit pairs (0% Glaze as 
stimuli and 0-9 as responses), presented 


TABLE 2 


MEAN PERCENTAGE RETENTION, Ci:Co, 
RELATED TO R PRIOR TO TESTING 


(n = 25) 


95.8 94 


THEORETICAL NOTES 


TABLE 3 


MEAN NuMBER CORRECT ON Tı, FOR E: 
AND Es IN EXPERIMENTS I AND II 


tandard error of 


4 i S 
R | Experiment I | Experiment II | “the difference 


.699 
.814 


E: 3.9 3.0 
Ez 6.04 5.9 


in six random orders. There were 100 
subjects selected from elementary psy- 
chology courses at Denison University 
and assigned at random to one of the 
five reinforcement (R) conditions. 

An overall F test (simple randomized 
analysis) of the percentage of N, to Ce 
switches given in Table 1 was significant 
beyond the .01 level (F=3.64, df = 4/80). 
A trend analysis of the switching data 
between groups showed a significant 
linear function; i.e., groups with more R 
trials prior to TT showed increased N:C 
switching. 

The data presented by Estes for the 
RTT and the RRTT conditions showed 
no increments of retention as a function 
of the added reinforcement. However, 
the retention data, CıCə (Table 2) from 
the current experiment were contrary to 
Estes’ findings and support the proposi- 
tion that increased reinforcements lead 
to increased retention. Analysis of the 
curve (Grant, 1956) indicated a signifi- 
cant linear trend. The incremental hy- 
pothesis was therefore supported both 
for the N:C switches and for retention. 

An additional 32 subjects were run in 
Experiment II under the same procedure 
as Experiment I with only R1 and R2 


TABLE 4 


MEAN TT: RELATIONSHIP PER TREATMENT 
IN EXPERIMENT II (n = 16) 


Groups Ni:C2 Ci:Ca 

Ey 073 56 

E: 138 .90 
Difference (Ez — E1) .065 46* 


* Mann-Whitney U = 69, p < .05. 
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conditions (n= 16 per treatment) and no 
blank intratrial interval between exposed 
materials. 

Acquisition data are given in Table 3. 
The R1 and R2 treatments across the two 
experiments did not differ reliably in 
mean number correct on the first test 
trial (¢ tests were not significant at .05 
level). Furthermore, Table 4 shows that 
increasing the number of reinforcements 
from one to two prior to testing resulted 
in a significant increase in retention. 
Although retention for corresponding 
groups across Experiments I and II were 
not identical, they were comparable. 
The R1 condition for Experiment I was 
superior to R2 in Experiment II, but, the 
respective R2 groups were not reliably 
different. On the other hand, the in- 
creased switching behavior N,;:C2 going 
from R1 to R2, clearly seen in Experi- 
ment I, was not obtained, apparently 
because the blank interval between suc- 
cessive exposures was omitted. 

The results of Experiment II suggest 
caution in premature theorizing within 
this deceptively simple paradigm.* For 
example, the blank interval (eliminated 
in Experiment II) might suggest simply 
a period for covert rehearsal of the ma- 
terial, thus substantiating an all-or-none 
view of associative learning where this 
period is eliminated. However, if re- 
hearsal were the major activity during 
this intraexposure interval, then we 
must still explain the fact that both re- 
tention and amount learned were com- 
parable across experiments; i.e., if re- 


3 A need for this caution in novel paradigms 
has been pointed out by Battig (1962) in 
recent studies dealing with the Rock para- 
digm. Battig, for example, has shown, con- 
trary to Rock (1957), that previously in- 
correct items show an incremental effect of 
repetition when compared with novel items 
under sensitive testing conditions. Moreover, 
with the RTT framework, Peterson, Saltzman, 
Hillner, and Land (1962) have obtained data 
which indicate the critical importance of the 
temporal intervals between R and Tı and T: 
(It should be noted, however, that their study 
did not use a separation of the R and T trials 
for all items comparable to the Estes study 
and the present experiments.) 
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hearsal occurred in Experiment I and 
could not occur in Experiment II, a de- 
crease in the number of correct responses 
per reinforcement from Experiment I to 
Experiment II would be expected. Since 
this did not happen, and only N,:C: 
behavior was markedly affected, the 
4-second period may be interpreted as 
providing for consolidation of subverbal 
S-R traces (see Brown, 1958), as well as 
possible rehearsal. 


Statistical Analyses 


Underwood and Keppel (1962) have 
discussed problems of differential item 
difficulty and individual differences as 
they relate to statistical artifacts. But 
there is an additional point which re- 
quires consideration. The practice todate 
(Bower, 1962; Estes, 1960; Jones, 1962; 
Williams, 1962) has been to present find- 
ings in the “tree” form which combines 
items and subjects into numbers of 
“cases.” Such an analysis, allowing 
several responses for a single subject, 
precludes independence among these 
scores. Thus, chi square tests such as 
Jones (1962, p. 160) reports must be 
considered inappropriate. 

The results of the current study illus- 
trate the point. The data are presented 
in the tree form in Figure 1 for compari- 
son with the Estes and Jones data. With 
the R1 proportions as a baseline, the 
“chance” frequency of N,:C:and Ni: Ne 
cases were compared across conditions 
R--Rs(x?=61.65, df=3, p<.001); all 
conditions gave significantly more N1: Ca 
switches than R;. Repeating the anal- 
ysis with Rə frequencies as a base, 
yielded no differences between Ro, Ra, 
and Ry; but Rs was significantly dif- 
ferent from Ro(x?=4.240, df=1, p<.05). 
Note that these analyses require the 
paradoxical assumption of independence 
of data within subjects. A more ade- 
quate test, a trend analysis of variance 
between groups (see p. 9), was conducted 
with one Ni:C2 percent obtained from 
each subject, and the chi square conclu- 
sions were corroborated in this instance. 

This is not so for the C:C cases. The 
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83 (69) c 


39% 


17 (3) N 


Ei 16 _(7) 
(251 cases) c 
N 
84 (93) N 
82 (89) 
57% ç 
g (un) 
18 N 
E2 31 (18) 
(230 cases) c 
69 (82) N 
89 c 
N 
Ez 
(210 coses) c 


N 


Estes RTT 
(384 cases) 
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94 c 
N 
E4 
(190 cases) c 
N 
N 
88 c 
Es 
(130 cases) 
N 
N 


Fic. ; 1. Results of extended RTT experiments (Experiment II data in parentheses) showing 
increasing proportions of N; :C, cases as function of increasing R prior to TıT (Ei through Es), 
Estes’ (1960) data are presented for comparison purposes. 


retention data of Experiment I, tabulated 
in numbers of cases (Figure 1), show ap- 
parent agreement with Estes’ reported 
findings across the R1 and R2 condi- 
tions; viz., 83% and 82% retention re- 
spectively. However, the comparison of 
means given in Table 2 by use of the F 
test (which takes into account correla- 
tion of intrasubject data by comparing 
between subject variance to within 
subject variance) yields a consistent 
increasing function for retention across 
experimental groups R1 through R5, 

In addition, the switching data for 
N:C cases in Experiment II, given in 
parentheses in Figure 2, yield a signi- 
ficant chi square between R1 and 
R2 conditions. Apparently, elimination 
of the 4-second intratrial interval pro- 


duced the incremental effect. However, 
a t test computed on the same data 
(taking into account correlation) was 
not statistically significant. 


Chance Baseline 


Another methodological question is 
the establishment of an appropriate 
chance level from which to gauge switch- 
ing behavior between T1 and T2. Jones 
(1962) attempted to establish this level 
by using a formula for intelligent guess- 
ing, g=1/n(1—c). Her formula, how- 
ever, does not answer all the problems of 
estimating an appropriate chance base 
line; i.e., in the present study, the sub- 
jects were not forced to guess if uncertain. 
Thus, two categories of negative responses 
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existed to be accounted for in chance 
measurement—omissions, and incorrect 
responses. In addition, a number of 
subjects showed a tendency to repeat 
certain numbers, most often generalized 
responses from a correct item to an in. 
correct item within a given test trial- 
Consequently, the probability ofa subject 
switching from N; to C+ was not as simple 
as one over the remaining number of items 
(1/k); nor was it as simple as 1/& multi- 
plied by the factor Jones included. 
Rather, it was some joint function of the 
remaining number of items, the subject's 
awareness of this number, the total pop- 
ulation from which the items were drawn, 
as well as the subject's guessing habits, 
with respect to the response unit in- 
volved. One suggestion to handle guess- 
ing habits where guessing is forced, 
would be to modify Jones’ formula by a 
factor which would take into account 
the subject’s generalizing from correct 
to incorrect items. 


Other Problem: Conceptual Discrimina- 
bility in RTT 

As noted earlier the problem of empiri- 
cally discriminating between all-or-none 
and incremental functions is a matter of 


Strength of Association (YS) 
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conducting methodologically sound ex- 
periments. The question of conceptual 
inference from obtained response prob- 
abilities is, however, much more difficult 
to answer than the empirical one. One 
could interpret an obtained incremental 
function within the framework of Estes’ 
(1950) earlier stimulus sampling model. 
The stimulus is considered to be a popu- 
lation of elements with the premises 
that: (a) with each reinforced trial, more 
stimulus elements from the population 
are conditioned; and (b) with two suc- 
cessive test trials, the probability of more 
conditioned stimulus elements being 
sampled per item is greater than with 
one test trial. Thus, increased switching 
N,-C; with increased R could be ac- 
counted for by a stimulus element sam- 
pling model. Also, the probability of a 
correct response on Ts following a cor- 
rect response on T, (retention) would 
also increase as a function of sampling 
increasing proportions of conditioned 
stimulus elements, thereby making ob- 
served retention increments, C1: Cs, con- 
sistent with the theory. 


The Verbal Threshold Concept (Vr) 


To expand this argument, we may 
hypothesize that, in a verbal learning ex- 


Reinforcement Trials 


Illustration of an elemer 
and retention as a 


Fic. 2. 


nt-sampling basis for incremental switching behavior 
function of increased reinforcements. 
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periment, there are two levels of process: 
verbal (e.g., overt recall) and subverbal 
(inferred learning). Further, learning 
increments are a direct function of the 
increase in proportion of conditioned to 
unconditioned elements for each item. 
(These, in turn, increase as R increases.) 
Assuming that a sufficient number of 
elements must be conditioned subverbally 
before a correct, overt, verbal response on 
Tı is given, and assuming Vr for each 
subject on a given item, it would be pos- 
sible to estimate the degree of incremental 
strength required to show learning, N to 
C, from Tı to Tz. This strength would 
be inversely related to the distance be- 
tween Vr and the proportion of stimulus 
elements conditioned subverbally prior 
to testing. Thus, increases in verbal 
switching behavior, Nı to C2, would not 
be likely to occur until a sufficient pro- 
portion of appropriate stimulus elements 
had been conditioned. Retention incre- 
ments, C;:Cs, would also be interpreted 
as a function of the increases in the pro- 
portion of elements conditioned lying 
above the Vr and between it and the 
asymptote of the curve. 

This hypothesized relationship is illus- 
trated in Figure 2. As the incremental 
strength increases and the discrepancy 
from the verbal response threshold de- 
creases, the response probability of nega- 
tive to positive shift from Tı to Tz should 
increase, representing changes in the 
proportion of conditioned to uncondi- 
tioned elements in the stimulus popula- 
tion (x/S). This interpretation implies 
that as R increases from 1 to n: (a) the 
amount of Nı:Cə shifts increases, and 
(b) the correct responses become more 
stable (Cı:C} also increases). 

However, these predictions are not ex- 
clusive to the all-or-none theory. A 
Hull-type incremental curve (e.g., Hull, 
1952) of habit strength, with a response 
oscillation principle, could be applied. 
Indeed, Underwood and Keppel (1962) 
raised the not uncommon learning- 
performance distinction as an answer to 
Estes’ (1960) recent statement, and used 
Hull’s approach to illustrate their point. 


However, the stimulus-sampling model 
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has the advantage of not being restricted 
to implying, as do Underwood and 
Keppel (1962, p. 3), that “Zero prob- 
ability is set at reaction threshold, . . .” 
in the RıTıTz paradigm. As Figure 2 
illustrates, as soon as some elements have 
been conditioned, the incremental 
strength is greater than zero. Thus, 
there is at least some likelihood that suc- 
cessive sampling of the stimulus popula- 
tion will include the conditioned ele- 
ments, and that this will be reflected in 
at least a small degree of Ni: C2 switching. 

And, if we agree with Estes, Hopkins, 
and Crothers (1960) and Jones (1962) 
that the subject may recognize somehow 
when he is correct on a test trial, then, 
given this same small degree of condi- 
tioning, we should expect greater N to C 
switching by simply increasing the num- 
bers of test trials. This, in fact, is what 
Jones (1962) obtained. 

Nevertheless, even granting the em- 
pirical applicability of the stimulus- 
sampling approach outlined in the fore- 
going discussion, the RTT paradigm 
provides no inherent sensitivity for 
uniquely determining the character of 
associations amongst the subverbal 
elements. 


Response Probability versus Conceptual 
Inference 


Within the limits of the present study, 
the obtained increase in response prob- 
abilities, with recall as the dependent 
variable, favors an incremental hypothe- 
sis. However, within the context of the 
molar-molecular (verbal-subverbal) con- 
ceptualization discussed earlier (see Fig- 
ure 2), one could still ask whether associ- 
ative learning would be all-or-none if the 
focus were on smaller elements (e.g-, Þe- 
low recall level). Stated operationally, 
the question could be phrased, “Would 
the associative learning process be re- 
vealed as all-or-none if a more sensitive ` 
dependent variable were chosen?” Estes 
(1960) has asserted this question has 
“... no logical bearing upon conclu- 
sions . . ."’ concerning the particular de- 
pendent variable studied in a given ex- 
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periment. Thus, he implies that, while 
a more sensitive measure such as recogni- 
tion may or may not show an incremental 
function, there is no logical demand that 
associative learning for recognition be 
related to associative learning for recall. 
However, the question might be put 
more appropriately by proposing that 
the “logical bearing” is related to one’s 
choice of unit for studying the hypothe- 
sized learning process (i.e., the elements 
of association). Estes (1962) has clearly 
stated his reluctance to deal with the 
general conceptual problem, “. . . since 
the [molecular] elements are generally 
assumed to be large in number, response 
probability can vary in so nearly a con- 
tinuous fashion that no direct test . . . 
[pp. 122-123]” of, increment versus mo- 
lecular all-or-none theory is possible. 
Nevertheless, a meaningful distinction 
between a specific term defined by a par- 
ticular set of experimental conditions and 
a more pervasive concept may be drawn. 
The conceptual problem implied may 
` be not unlike the controversy surround- 
ing the notion of “perceptual defense.” 
Eriksen (1956) and Bricker and Chapanis 
(1953), among others, clearly demon- 
strated that the inferred concept de- 
veloped out of dependent variables with 
differing degrees of sensitivity. Estes’ 
views leave the way open for a similar un- 
parsimonious proposal, that we have at 
least three kinds of associative learning: 
of recall, of recognition, and of savings.* 
Confusion could easily develop from a 
mixture of two different kinds of theo- 
retical development. Although a minia- 
ture system (e.g., a given statistical 
learning model) may be very effective 
predictively for studying a given de- 
pendent variable in a specific setting, 
this predictive efficiency must be clearly 
separated from the model’s integration 
with a broad theory of associative learn- 
ing in general. 


4A fourth category may be implied in a 
paper (Eimas & Zeaman, 1963) published 
while this paper was going to press. Speed 
of response supported an incremental inter- 
pretation of associative learning while recall 
suggested an all-or-none process. 
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It is proposed to be more profitable in 
the present case to assume that there are 
three crude measures by which to infer a 
single associative learning process, rather 
than three learning processes. We can 
only tentatively conclude that, with a 
given dependent variable and current 
measuring instruments, the process seems 
to be of such and such a nature, retain- 
ing a clear distinction between obtained 
response probabilities and inferred, un- 
derlying process (or possibly processes). 
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